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PREFACE 



This manual describes the functions of CRAY X-MP four-processor computer 
systems. It is written to assist programmers and engineers and assumes a 
familiarity with digital computers. 

This manual describes the overall computer system, its configurations, 
and equipment. It also describes the operation of the Central Processing 
Units (CPUs) that execute instructions, provide memory protection, report 
hardware exceptions, and provide interprocessor communications within the 
system. 

The following publications give details of the I/O Subsystem (IOS), the 
disk storage units (DSUs), and the Solid-state Storage Device (SSD): 

HR-0030 I/O Subsystem Hardware Reference Manual 

HR-0031 SSD Solid-state Storage Device Hardware Reference Manual 

HR-0630 Mass Storage Subsystem Hardware Reference Manual 

HR-0077 Disk Systems Hardware Reference Manual 



/////////////////////////////////////////////////////// 

WARNING 

This equipment generates, uses, and can radiate radio 
frequency energy and if not installed and used in 
accordance with the instructions manual, may cause 
interference to radio communications. It has been 
tested and found to comply with the limits for a 
Class A computing device pursuant to Subpart J of Part 
15 of FCC Rules, which are designed to provide 
reasonable protection against such interference when 
operated in a commercial environment. Operation of 
this equipment in a residential area is likely to cause 
interference, in which case, the user at his own 
expense will be required to take whatever measures may 
be required to correct the interference. 

/////////////////////////////////////////////////////// 
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SYSTEM DESCRIPTION 



CRAY X-MP four-processor computer systems are powerful, general purpose 
machines that contain four central processing units (CPUs). Like all 
CRAY X-MP multiprocessor systems, they are able to achieve extremely high 
multiprocessing rates by efficiently using the scalar and vector 
capabilities of all CPUs combined with the systems' solid-state, 
random-access memory (RAM) and shared registers. 

Vector processing is the performance of iterative operations on sets of 
ordered data. When two or more vector operations are chained together, 
two or more operations can be executing simultaneously; therefore, the 
computational rate for vector processing greatly exceed the computational 
rates of conventional scalar processing. Scalar operations complement 
the vector capability by providing solutions to problems not readily 
adaptable to vector techniques. 

The machine has very high performance levels, and equipment options allow 
systems to be configured for a particular use. Central Memory of the 
four-processor mainframe can be 4 million (model 44), 8 million (model 
48), or 16 million (model 416) 64-bit words (refer to table 1-1). The 
system is compatible with all existing models of the Cray I/O Subsystem 
(IOS) and its associated mass storage subsystem. In addition, an 
optional high-performance Cray Research, Inc. (CRI) SSD Solid-state 
Storage Device can be attached to the mainframe. Figure 1-1 shows the 
mainframe with a Cray IOS and an SSD. 

This section describes system components and configurations. Table 1-1 
gives overall system characteristics. 
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Figure 1-1. CRAY X-MP Model 48 Mainframe with a 
Cray I/O Subsystem and a Solid-state 
Storage Device 
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Table 1-1. CRAY X-MP Four-processor System Characteristics 



Configuration 



Mainframe with four CPUs 

IOS with 2, 3, or 4 I/O Processors (IOPs) 

Optional SSD 



8.5-ns clock 
CPU speed 



8.5-ns CPU CP 

117 million floating-point additions per second 

117 million floating-point multiplications per second 

117 million half-precision floating-point divisions 

per second 

37 million full-precision floating-point divisions per 

second 

Simultaneous floating-point addtion, multiplication, 

and reciprocal approximation 



9.5-ns clock 
CPU speed 



9.5-ns CPU CP 

105 million floating-point additions per second per 

CPU 

105 million floating-point multiplications per second 

per CPU 

105 million half-precision floating-point divisions 

per second per CPU 

33 million full-precision floating-point divisions 

per second per CPU 

Simultaneous floating-point addition, multiplication, 

and reciprocal approximation within each CPU 



Memories 



• Mainframe has 4 million (model 44), 8 million (model 
48), or 16 million (model 416) 64-bit words in 
Central Memory 



Input/Output 



• Two 1250 Mbyte per second channel pairs for interface 
to SSD 

• Four 100 Mbyte per second channel pairs for interface 
to IOS 

• Four 6 Mbyte per second channel pairs 



Physical 



64 sq ft (5.94 m 2 ) floor space for the mainframe 
15 sq ft (1.39 m 2 ) floor space for the IOS 
15 sq ft (1.39 m 2 ) floor space for the SSD 
5.65 tons (5.12 Mg), mainframe weight 
1.5 tons (1.36 Mg), IOS weight 
1.5 tons (1.36 Mg), SSD weight 
Liquid refrigeration of each chassis 
400-Hz power from motor-generators 
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CONVENTIONS 

This manual uses the following conventions. 

ITALICS 

Italicized lowercase letters, such as jk, indicate variable information. 

REGISTER CONVENTIONS 

Parenthesized register names are used frequently as a form of shorthand 

notation for the expression the contents of register . For example. 

Branch to (P) means Branch to the address indicated by the contents of 
register P. 

Designations for the A, B, S, T, and V registers are used extensively. 
For example, Transmit (Ijk) to Si means Transmit the contents of the 
T register specified by the jk designators to the S register specified 
by the i designator. 

Register bits are numbered right to left as powers of 2, starting with 
2^. Bit 2^3 of an S, V, or T register value represents the most 
significant bit. Bit 2^3 of an A or B register value represents the 
most significant bit. (A and B registers are 24 bits.) The numbering 
conventions for the Exchange Package and the Vector Mask register are 
exceptions. Bits in the Exchange Package are numbered from left to right 
and are not numbered as powers of 2 but as bits through 63 with bit 
as the most significant and bit 63 as the least significant. The Vector 
Mask register has 64 bits, each corresponding to a word element in a 
vector register. Bit 2^3 corresponds to element 0, bit 2^ 
corresponds to element 63. 



NUMBER CONVENTIONS 

Unless otherwise indicated, numbers are decimal numbers. Octal numbers 
are indicated with an 8 subscript. Exceptions are register numbers, 
channel numbers, instruction parcels in instruction buffers, and 
instruction forms, which are given in octal without the subscript. 



CLOCK PERIOD 

The basic unit of CPU computation time is the clock period (CP). For 
mainframes with serial numbers 406 and above, the CP is 8.5-ns. For 
mainframes with serial numbers 405 and below, the CP is 9.5-ns. 
Instruction issue, memory references, and other timing considerations are 
often measured in CPs. 
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SYSTEM COMPONENTS 

The four-processor system consists of a mainframe and an IOS. Mass 
storage devices, front-end interfaces, and optional tape devices are also 
integral parts of a system. Optionally, a Cray SSD can be part of the 
system. Supporting this equipment are condensing units for 
refrigeration, motor-generators to provide system power, and power 
distribution units for the mainframe, IOS, and SSD. The following pages 
describe the system components. 



CENTRAL PROCESSING UNITS 

Each CPU has independent control and computation sections. All CPUs 
share Central Memory and the inter-CPU communication and I/O sections. 
(CPU sections are described in later sections.) Figure 1-2 shows the 
mainframe chassis. Figure 1-2 illustrates the basic organization of the 
computer; figure 1-3 illustrates the components and control and data 
paths of each CPU in the system. 



CONTROL SECTION 



Control 
Registers 



• Interrupt 



Status 
Register 



CONTROL SECTION 



Control 
Registers 



• Interrupt 



• Registers 



• Registers 



• Shared Registers 



Semaphore 
Registers 



Real-time Clock 
Register 




MEMORY SECTION 



8, or 16 million 
64-bit words 



• Pour 6 Mbyte per second channel pairs 

• Two 1250 Mbyte per second channel pairs 

• Four 100 Mbyte per second channel pairs 



CONTROL SECTION 



• Control 
Registers 



• Status 
Register 



CONTROL SECTION 



• Control 
Registers 



• Interrupt 
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Figure 1-2. Basic Organization of the Four-processor System 
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Vector Registers 
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t The Vector Pop/Parity shares its input path with the Reciprocal Approximation unit 

tt The Second Vector Logical shares its input and output path with the Floating-point Multiply unit. 

ttt Second Vector Logical and Index Generation units are not available on all systems. 
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Figure 1-3. Control and Datapaths for a Single CPU 
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INTERFACES 

The Cray mainframe is designed for use with front-end computers in a 
computer network. A front-end computer system is self-contained and 
executes under the control of its own operating system. 

Standard interfaces connect the Cray mainframe's I/O channels to channels 
of front-end computers/ providing input data to the Cray mainframe and 
receiving output from it for distribution to peripheral equipment. 
Interfaces compensate for differences in channel widths, machine word 
size, electrical logic levels, and control signals. (The Master I/O 
Processor (MIOP) of the IOS communicates with the mainframe through a 6 
Mbyte per second channel pair to a channel adapter module in the Cray 
mainframe.) Typically, communication continues through a front-end 
interface to the front-end computer typically through a front-end 
computer I/O channel. 

A stand-alone cabinet houses the front-end interface (figure 1-4) located 
near the host computer. Its operation is invisible to the front-end 
computer user and the Cray user. 

A primary goal of the interface is to maximize the use of the front-end 
channel connected to the Cray system. Since the MIOP channel connected 
to the interface is faster than any front-end channel connected to the 
interface, the burst rate of the interface is limited by the maximum rate 
of the front-end channel. 

Interfaces to front-end computers allow the front-end computers to 
service the Cray computer system in the following ways: 

• As a master operator station 

• As a local operator station 

• As a local batch entry station 

• As a data concentrator for multiplexing several other stations 
into a single Cray channel 

• As a remote batch entry station 

• As an interactive communication station 

Peripheral equipment attached to the front-end computer varies depending 
on the use of the Cray system. 
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Figure 1-4. Typical Interface Cabinet 



I/O SUBSYSTEM 

The IOS, shown in figure 1-5, is standard on the CRAY X-MP system and has 
two, three, or four IOPs, Buffer Memory, and required interfaces. The 
IOS is designed to provide fast data transfer between its Buffer Memory 
and the mainframe's Central Memory as well as front-end computers, 
peripheral devices, and storage devices. 

Four types of IOPs may be configured in an IOS: an MIOP, a Buffer IOP 
(BIOP), a Disk IOP (DIOP), and an Auxiliary IOP (XIOP). All IOSs must 
have at least one MIOP and one BIOP. The number of DIOPs and XIOPs is 
site dependent. 

Each IOP of the IOS has a memory section, a control section, a 
computation section, and an input/output (I/O) section. I/O sections are 
independent and handle some portion of the I/O requirements for the IOS. 
Each IOP also has six direct memory access (DMA) ports to its Local 
Memory. 

The MIOP controls the front-end interfaces and the standard group of 
station* peripherals. The Peripheral Expander interfaces the station 
peripherals to one DMA port of the MIOP. The MIOP also connects to 
Buffer Memory and to the mainframe over a 6 Mbyte per second channel pair 



The term station means both hardware and software. Station is the 
link to the front end or can act as a limited front-end system (as the 
MIOP). 
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Figure 1-5. I/O Subsystem Chassis 
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The BIOP is the main link between the mainframe's Central Memory and the 
mass storage devices. Data from mass storage is transferred through the 
BIOP's Local Memory to the mainframe's Central Memory through a 100 Mbyte 
per second channel pair. 

The DIOP is used for additional disk storage units (DSUs). This 
processor can handle up to four disk controller units (DCUs) with up to 
16 disk storage units. The DIOP uses one DMA port for each controller, 
one DMA port to connect to Buffer Memory, and another DMA port to connect 
a 100 Mbyte per second channel pair to the mainframe Central Memory. 

The XIOP is used for block multiplexer channels and interfaces to a 
maximum of four BMC-4 Block Multiplexer Controllers. Each controller can 
handle up to four block multiplexer channels. The XIOP uses one DMA port 
for each controller and another DMA port to connect with Buffer Memory. 

IOS hardware allows for simultaneous data transfers between the BIOP, 
MIOP, and DIOP, or XIOP of the IOS and the mainframe's Central Memory. t 

Section 2 describes the CPU I/O section for the system. Refer to the I/O 
Subsystem Hardware Reference Manual for a complete description of the IOS. 



DISK STORAGE UNITS 

For mass storage, the system uses CRI disk storage units. A disk 
controller unit interfaces the disk storage units with an IOP of an IOS 
through one DMA port. Up to four disk storage units can be connected to 
a single disk controller unit. 

The IOP and the disk controller unit can transfer data between the DMA 
port and four disk storage units with all disk storage units operating at 
full speed without missing data or skipping revolutions. A minimum of 2 
and a maximum of 48 disk storage units can be configured on an IOS. The 
IOS chassis houses the disk controller unit. 

Each disk storage unit has two accesses for connecting it to 
controllers. The second independent datapath to each disk storage unit 
exists through another CRI controller. Reservation logic provides 
controlled access to each disk storage unit. Dynamic sharing of devices 
is not supported by the Cray operating system COS software. The Disk 
Systems Hardware Reference Manual includes further information about the 
mass storage subsystem. 



Software to support the 100 Mbyte per second channel pair to the MIOP 
or XIOP is not currently available. 
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SOLID-STATE STORAGE DEVICE 

The SSD/ shown in figure 1-6/ is used for temporary data storage and 
transfers data to and from the mainframe's Central Memory. The transfer 
speed is dependent on the SSD memory size and configuration as described 
in the SSD Solid-state Storage Device Hardware Reference Manual. The 
maximum speed attained from the SSD to Central Memory is 1250 Mbytes/s 
for each 1250 Mbyte channel. 




Figure 1-6. Solid-state Storage Device Chassis 
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CONDENSING UNITS 

Condensing units (figure 1-7) contain the major components of the 
refrigeration system used to cool the computer chassis and consist of two 
25-ton condensers. Heat is removed from the condensing unit by a 
second-level cooling system that is not part of the computer system. 
Freon, which cools the computer, picks up heat and transfers it to water 
in the condensing unit. 







Figure 1-7. Condensing Unit 
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POWER DISTRIBUTION UNITS 

The Cray mainframe, IOS, and SSD all operate from 400-Hz, three-phase 
power. The mainframe, IOS, and SSD have independent power distribution 
units. 

The power distribution unit for the mainframe contains adjustable 
transformers for regulating the voltage to each column of the mainframe. 
The power distribution unit also contains temperature and voltage 
monitoring equipment that checks temperatures at strategic locations on 
the mainframe chassis. Automatic warning and shutdown circuitry protects 
the mainframe in case of overheating or excessive cooling. Control 
switches for the motor-generators and the condensing unit are mounted on 
the mainframe power distribution unit. 

A smaller power distribution unit performs similar functions for the IOS 
chassis or the SSD chassis. 

Figure 1-8 shows the power distribution units for the mainframe (left) 
and for the IOS or SSD (right). 
















Figure 1-8. Power Distribution Units 
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MOTOR-GENERATOR UNITS 

The motor-generator units convert primary power from the commercial power 
mains to the 400-Hz power used by the system. These units isolate the 
system from transients and fluctuations on the commercial power mains. 
The equipment consists of two or three motor-generator units and a 
control cabinet. Figure 1-9 shows a typical motor-generator and its 
control cabinet. 





Figure 1-9. Motor-generator Equipment 
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SYSTEM CONFIGURATION 

Figures 1-10 and 1-11 illustrate two configurations for the CRAY X-MP 
four-processor computer systems. 




CRAY X-MP mainframe 

4, 8, or 16 Million 

64-Bit Words 



/e>3l-o9 



Cray 6 Mbyte channel 
Cray 100 Mbyte channel 
Cray 1250 Mbyte channel 



Figure 1-10. 



Block Diagram of a Typical Four-processor System 
with Full Disk Capacity 
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CRAY X-MP mainframe 

4, 8, or 16 Million 

64-Bit Words 



lo3) -"/© 



Cray 6 Mbyte channel 
Cray 100 Mbyte channel 
Cray 1250 Mbyte channel 



Figure 1-11. Block Diagram of a Typical Four-processor System 
with Block Multiplexer Channels 
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CPU RESOURCES 



All four central processing units (CPUs) share the mainframe's Central 
Memory, the inter-CPU communication section, and the input/output (I/O) 
section. The following pages describe these areas common to all CPUs. 



CENTRAL MEMORY 

Central Memory consists of a number of banks of solid-state, 
random-access memory (RAM) and is shared by the CPUs and the I/O 
section. Three Central Memory sizes are available with either 16K- or 
| 64K-chip technology: 4 million words with 32 banks (64K chips), 

8 million words with either 32 banks (64K chips) or 64 banks (16K chips), 
or 16 million words with 64 banks (64K chips). Banks are independent of 
each other; sequentially addressed words reside in sequential banks. 
Each word is 72 bits, with 64 data bits and 8 check bits. 

| Central Memory cycle time takes 4 clock periods (CPs) to execute. Access 
time, the time required to fetch an operand from Central Memory to an 

| operating register, is 14 CPs for address (A) and scalar (S) registers. 

Access time is 17 CPs plus vector length for a vector (V) register and 16 
CPs plus block length for a block transfer to a intermediate address (B) 
or intermediate scalar (T) register. 

The maximum transfer rate per CPU for B, T, and V registers is 3 words 
per CP; for A and S registers per CPU, it is 1 word every 2 CPs. 
Transfer of instructions to instruction buffers occurs at a rate of 
32 parcels (8 words) per CP. For the I/O section, the transfer rate is 
4 words per CP. 

Central Memory features are summarized and are described in detail in the 
following paragraphs. 

• Shared access from all CPUs 

■ • 4 million, 8 million, or 16 million words of integrated circuit 
memory, using 16K or 64K chips 

• 64 data bits and 8 error-correction bits per word 
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64 interleaved banks 
4-CP bank cycle time 
Single-error-correction/double-error-detection (SECDED) 

3 words per CP transfer rate to B, T, and V registers per CPU 
1 word per 2-CP transfer rate to A and S registers per CPU 

8 words per CP transfer rate to instruction buffers 

4 words per CP transfer rate to I/O concurrent with all memory 
activity except instruction fetch and exchange 



MEMORY ORGANIZATION 

Memory is organized to provide fast, efficient access for all CPUs. Data 
transfers to and from memory are corrected with SECDED. Central Memory 
is organized into four sections with 16 banks in each section. 

Each CPU is connected to an independent access path into each of the four 
sections, as shown in figure 2-1. This configuration allows up to 16 
memory references per CP. 
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Figure 2-1. Central Memory Organization for a 
Four -processor System 



f Low-numbered 8 banks in each section are in 3 2 -bank system 
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MEMORY ADDRESSING 

Memory addressing is dependent on system memory architecture (chip size 
and number of banks) and memory size. The following paragraphs describe 
the memory addressing for the four-processor system. 



Memory addressing for 32-bank, 64K-chip, 4-million-word system 

A word in a 32-bank, 64K-chip memory is addressed in a maximum of 21 
bits, as shown in table 2-1. The low-order 5 bits specify one of the 32 
banks. The next 16-bit field specifies an address within the chip. The 
high-order 2 bits specify one chip on the module. 



Memory addressing for 64-bank, 16K-chip, 8-million-word system 

A word in a 64-bank, 16K-chip memory is addressed in a maximum of 22 
bits, as shown in table 2-1. The low-order 6 bits specify one of the 64 
banks. The next 14-bit field specifies an address within the chip. The 
high-order 3 bits specify one chip on the module. 



Memory addressing for 32-bank, 64K-chip, 8-million-word system 

A word in a 32-bank, 64K memory is addressed in a maximum of 22 bits, as 
shown in table 2-1. The low-order 5 bits specify one of the 32 banks. 
The next 16-bit field specifies an address within the chip. The 
high-order 2 bits specify one chip on the module. 



Memory addressing for 64-bank, 64K-chip, 16-million-word system 

A word in a 64-bank, 64K-chip memory is addressed in a maximum of 23 
bits, as shown in table 2-1. The low-order 6 bits specify one of the 64 
banks. The next 16-bit field specifies an address within the chip. The 
high-order 2 bits specify one chip on the module. 



MEMORY ACCESS 

Each CPU in the system has four memory access ports, referred to as Port 

A, Port B, Port C, and I/O. Each port is capable of making one reference 
per CP. Ports A, B, and C are used for CPU register transfers. 

B, T, and vector memory instructions issue to a particular memory port: 

• Vector read (block reads only) and B read instructions 
(176 and 034) use Port A 



• 



Vector read (block reads only) and T read instructions (176, 036) 
use Port B 
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• Vector store, B, or T store instructions (177, 035, and 037) and 
scalar instructions (100 through 137) use Port C 

Once an instruction issues to a port, that port is reserved until all 
references are made for that instruction. 



Table 2-1. Memory Addressing Formats 



Chip 
Type 



Central 
Memory 



No. of 
Banks 



No. of 
Columns 



Address Format 



16K 



64 



12 



64K 



32 



12 



64K 



32 



12 



64K 



16 



64 



12 



22 



; 19 



Chip 

address 

select 


Internal bit 
address in 
chip 


6-bit 
bank 



21 



; 20 



Chip 

address 

select 


Internal bit 
address in 
chip 


5-bit 
bank 



22 



; 20 



Chip 

address 

select 


Internal bit 
address in 
chip 


5-bit 
bank 



23 



; 21 



Chip 

address 

select 


Internal bit 
address in 
chip 


6-bit 
bank 



The references for each element of a block transfer (V, B, or T) are made 
and completed in sequence through a port. However, since each reference 
is examined individually for possible conflicts, the data flow for a 
transfer may not be continuous. If an instruction requires a port that is 
busy, issue is blocked. Total execution time of the transfer depends on 
the number and type of conflicts encountered during the transfer. 
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NOTE 

Because concurrent block reads and writes are not 
examined for memory overlap hazard conditions (that is, 
read before write or write before read), the software 
must detect where this condition occurs and ensure 
sequential operation. 



The bidirectional memory mode enable (002500), bidirectional memory mode 
disable (002600), and the complete memory reference (002700) instructions 
are provided to resolve these cases and assure sequential operation. If 
the bidirectional memory mode is clear, block reads and writes are not 
allowed to operate concurrently within that CPU. Instruction 002700 
allows the program to wait until the last references of all preceding 
block transfers are past the conflict resolution stage within the CPU 
issuing it and the transferred data is being transmitted to the 
designated memory or register locations. Instruction 002700 provides 
software a mechanism, wherever necessary in the program, to guarantee 
sequential memory operation within a CPU or between CPUs. 

Issue of scalar memory references requires Ports A, B, and C to be 
available, ensuring sequential operation between block transfers and 
scalar references within a CPU. 

A scalar reference conflict is detected in CP 4 of execution. If a 
conflict occurs, two more scalar references are allowed to issue. A 
fourth scalar reference holds issue if the conflict condition still 
exists for the first scalar reference. 

Scalar references always execute in the order they are issued within a 
CPU. Instruction 002700 detects when all scalar references are past the 
conflict resolution stage within the CPU issuing it. 

An I/O channel references memory through a specific CPU's I/O port (refer 
to the subsection on CPU Input /Output) . The I/O port can be active 
regardless of the activities on Ports A, B, or C. 

For instruction fetches and exchange sequences, the CPUs are allowed 
access to memory in pairs; CPUs and 1 comprise one pair, CPUs 2 and 3 
another pair. Only one instruction fetch or exchange sequence can occur 
among the four CPUs at a time. 

When a CPU requests an instruction fetch, referencing from all memory 
ports associated with that CPU pair is inhibited and the 32 banks being 
referenced are reserved (to prevent referencing from the other CPU 
pair). When memory is quiet (0 to 3 CPs), the fetch proceeds and 
references 32 banks in the next 4 CPs. Referencing of the eight ports is 
not enabled until 3 CPs later, to ensure all 32 banks are quiet. 
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NOTE 

A fetch sequence that follows a scalar store can, under 
certain conditions, complete before the store. For 
this to happen, however, an out-of -buffer condition 
must arise before the scalar store is in CP 2 of 
execution. The out-of-buf fer condition can occur 
before the scalar store is in CP 2 of execution if a 
buffer boundary is crossed without doing a branch. 
This presents a problem only if the fetch and store are 
to the same area in memory. Therefore, software that 
utilizes dynamic coding should ensure that the code 
generated is actually in memory before that area of 
memory is fetched into the instruction buffers. 



During this time, the other CPU pair has access to the remaining banks of 
memory. 

When a CPU requests an exchange, all referencing from the four memory 
ports of the other CPU in the CPU pair is inhibited and 32 banks are 
reserved (to prevent referencing from the other CPU pair). When memory 
is quiet (0 to 3 CPs), the exchange proceeds and references 16 banks in 
the next 20 CPs. Each bank is referenced twice during this time, once 
for a read and once for a write. An exchange sequence requires all 
activities within a CPU to complete before the exchange request is made. 
As with the instruction fetch, the other CPU pair has access to the 
remaining banks of memory. 

A fetch request follows immediately after the exchange is complete and 
then referencing from the memory ports of the other CPU in the pair is 
enabled. 



Conflict resolution 

During each CP, references to the memory ports in the system are examined 
for memory access conflicts. If a conflict occurs for a reference, the 
reference is held and no further referencing from that port is allowed 
until the conflict is resolved. 

Three types of memory access conflicts can occur: Bank Busy, 
Simultaneous Bank, and Section Access. 
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Bank Busy conflict - The Bank Busy conflict is caused by any port 

within or between CPUs requesting a bank currently in a reference 

cycle. 

Resolution of this conflict occurs when the bank cycle is complete. 

All ports in the CPU are held 1, 2, or 3 CPs because of a Bank Busy 

conflict. 

Simultaneous Bank conflict - The Simultaneous Bank conflict is caused 
by two or more ports in different CPUs requesting the same bank. 
Resolution of this conflict is based on a priority (refer to 
subsection on Memory access priorities). All ports in a CPU are held 
1 CP because of a Simultaneous Bank conflict. A Bank Busy conflict 
always follows a Simultaneous Bank conflict. 

Section Access conflict - The Section Access conflict is caused by two 
or more ports in the same CPU requesting any bank in the same 
section. Resolution of this conflict is based on priority. The 
highest priority port is allowed to proceed, all other ports involved 
in this conflict hold (refer to subsection on Memory access 
priorities). The port is held 1 CP because of a section access 
conflict. 



Memory access priorities 

The following priorities are used to resolve memory access conflicts. 

• Intra-CPU priority: the priority between Ports A, B, and C is 
determined by the following conditions: 

Any port with an odd increment always has a higher priority 
than a port with an even increment, regardless of their 
issued sequence. 

Among all ports with the same type of increment (odd or 
even), the relative time of issue determines the priority, 
with the first issued having the highest priority. 

• Inter-CPU priority: every 4 CPs the priority between CPUs 
changes. 

• I/O priority: the I/O ports are always lowest priority, within 
CPUs. 



MEMORY ERROR CORRECTION 

A SECDED network is used between a CPU and memory. SECDED assures that 
data written into memory can be returned to the CPU with consistent 
precision (figure 2-2). 
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Figure 2-2. Memory Datapath with SECDED 



If a single bit of a data word is altered, the single error alteration is 
automatically corrected before passing the data word to the computer. If 
2 bits of the same data word are altered, the error is detected but not 
corrected. In either case, the CPU can be interrupted, depending on 
interrupt options selected to allow processing of the error. For 3 or 
more bits in error, results are ambiguous. 

The SECDED error processing scheme is based on error detection and 
correction codes devised by R. W. Hamming.* An 8-bit check byte is 
appended to the 64-bit data word before the data is written in memory. 
The 8 check bits are generated as even parity bits for a specific group 
of data bits. Figure 2-3 shows the bits of the data word used to 
determine the state of each check bit. An X in the horizontal row 
indicates that data bit contributes to the generation of that check bit. 
Thus, check bit is the bit that makes group parity even for the group 
of bits 2l, 23, 2 5, 2?, 29, 2^, 2^, 2 ^ , 2^ , 2", 221, 2 23, 2 25 , 
2 27 , 2 29 , and 2 31 through 2 55 . 

The 8 check bits and the data word are stored in memory at the same 
location. When read from memory, the same 64-bit matrix of figure 2-3 is 
used to generate a new set of check bits, which are compared with the old 
check bits. The resulting 8 comparison bits are called syndrome TT bits 
(S bits). The states of these S bits are all symptoms of any error that 
occurred (l=No compare). If all syndrome bits are 0, no memory error is 
assumed. 

Any change of state of a single bit in memory causes an odd number of 
syndrome (S) bits to be set to 1. A double error (an error in 2 bits) 
appears as an even number of syndrome bits set to 1. 



f Hamming, R.W. , "Error Detection and Correcting Codes," Bell System 
Technical Journal, 29, No. 2, pp. 147-160 (April, 1950). 

ft Syndrome: Any set of characteristics regarded as identifying a 

certain type, condition, and so on. (Webster's New World Dictionary) 
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The matrix is designed so that: 

• If all S bits are 0, no error is assumed. 

• If only 1 S bit is 1, the associated check bit is in error. 

• If more than 1 S bit is 1 and the parity of S bits SO through S7 
is even, a double error (or an even number of bit errors) occurred 
within the data bits or check bits. 

• If more than 1 S bit is 1 and the parity of all S bits is odd, a 
single and correctable error is assumed to have occurred. The S 
bits can be decoded to identify the bit in error. 

• If 3 or more memory bits are in error, the parity of all S bits is 
odd and results are ambiguous. 

Modules involved with generating and interpreting the 8-bit check byte 
used for SECDED include logic that can be used for verifying check bit 
storage, check bit generation, and error detection and correction. Refer 
to appendix D for information on SECDED maintenance functions. 

CHECK BYTE 



2 71 2 70 2 69 2 68 2 67 2 66 2 65 2 61 * 2 63 2 62 2 61 2 60 2 59 2 58 2 57 2 55 2 55 2 5u 2 53 2 52 2 51 2 50 2 1 * 9 2 hB 



check bit o 
check bit 1 
check bit 2 
check bit 3 
check bit 4 
check bit 5 
check bit 6 
check bit 7 x 



xxxxxxxx 
xxxxxxxx 

xxxxxxxx xxxxxxxx 

xxxxxxxx xxxxxxxx 

xxxx xxxx 

XX XX XX XX 

XXXX XXXX 

X XXX X XXX 



2 "*7 2 46 2 1 * 5 2 hk 2 1 * 3 2 k2 2 hl 2 k0 2 39 2 38 2 37 2 36 2 35 2 3 "* 2 33 2 32 2 31 2 30 2 29 2 28 2 27 2 26 2 25 2 2 "* 

XXXXXXXX XXXXXXXX X X X X 

xxxxxxxx XXXXXXXX XX XX 

xxxxxxxx xxxx 

xxxxxxxx X XXX 

xxxx xxxx 

XX XX XX XX xxxxxxxx 

xxxx xxxx xxxxxxxx 

x x XX X X XX xxxxxxxx 



2 23 2 22 2 21 2 20 2 19 2 18 2 17 2 16 2 15 2 1 ". 2 13 2 12 2 11 2 10 2 9 2 8 2 7 2 6 2 5 2 U 2 3 2 2 2 1 2 

xxxx xxxx xxxx 

XX XX XX XX XX XX 

xxxx xxxx xxxx 

X XXX X XXX X XXX 

xxxxxxxx xxxxxxxx xxxxxxxx 

xxxxxxxx xxxxxxxx 

xxxxxxxx xxxxxxxx 

xxxxxxxx xxxxxxxx 



Figure 2-3. Error Correction Matrix 
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INTER-CPU COMMUNICATION SECTION 

The inter-CPU communication section of the system contains special 
hardware for communication among the CPUs, for control, and for a 
Real-time Clock (RTC). The RTC, Shared Address (SB), Shared Scalar (ST), 
and Semaphore (SM) registers are shared by the CPUs. These registers 
with their sources and destinations are shown in figure 2-4 and described 
in the following paragraphs. 
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»- Si 
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Figure 2-4. Shared Registers and Real-time Clock 



REAL-TIME CLOCK 

The mainframe contains one RTC register shared by the CPUs. Programs can 
be timed precisely by using the CP counter, which is 64-bits wide and 
advances one count each CP. Since the clock advances synchronously with 
program execution, it can be used to time the program to an exact number 
of CPs. In such an application, however, the counting can contain counts 
from other tasks if an interrupt occurs before the end time is read. 
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Instructions used with the RTC register are: 

Octal Code CAL Syntax Description 

0014 JO RT Sj Enter the RTC register with (Sj) 
072i00 Si RT Transmit (RTC) to Si 

A program reads the CP counter using instruction 072 and resets it with 
instruction 0014 jO. Loading or reading the CP counter can occur from 
all CPUs simultaneously time. If more than one CPU is in monitor mode, 
the software should ensure that only one CPU enters a value into this 
register. 



INTER-CPU COMMUNICATION AND CONTROL 

Five identical sets of shared registers are used for communication and 
control among CPUs. Each set contains eight 24-bit Shared Address (SB) 
registers, eight 64-bit Shared Scalar (ST) registers, and 32 1-bit 
Semaphore (SM) registers. 

Each CPU's Cluster Number (CLN) register determines which set of shared 
registers is accessed by a CPU (clustering). The CLN register is loaded 
from the Exchange Package or, if the CPU is in monitor mode, through 
instruction 0014 j3. 

The CLN register can contain one of six different values. Values 1, 2, 
3, 4, or 5 allow the CPU to access one of the five sets of shared 
registers. Value prevents any access to shared registers by the CPU. 
If the value is 0, instructions regarding the shared registers become 
no-ops, except for the instructions returning values to Ai or Si, 
which return a zero value. If the CLN registers in more than one CPU are 
set to the same value (1, 2, 3, 4, or 5), then those CPUs share a common 
set of SB, ST, and SM registers. 



Shared Address and Shared Scalar registers 

The SB and ST registers are used for passing address and scalar 
information from one CPU to another. No hardware reservations are made 
on these registers. Any necessary reservations to restrict access to 
these registers must be handled in the software through use of the SM 
registers or by shared memory design. The single hardware restriction on 
access to the SB and ST registers is that only one read or one write 
operation can occur in a CP. 
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The instructions used with the SB and ST registers are: 

Octal Code CAL Syntax Description 

026ij7 Ai SBj Transmit (SBj) to Ai 

027 ij7 SBJ Ai Transmit (Ai) to SBj 

072ij3 Si STj Transmit (STj) to Si 

073ij3 STj Si Transmit (Si) to STj 

Semaphore registers 

The SM registers are used for control among the CPUs. No hardware 
reservations are made on these registers. Loading or reading the SM 
registers or setting or clearing a particular SM register can occur at 
any time from any or all CPUs. 

The test and set instruction (0034J&) is the only operation on the SM 
registers including a hardware interlock. This interlock prevents a 
simultaneous test and set operation on the same SM register from more 
than one CPU. The test and set instruction first tests the value of the 
selected SM register. If the value is 0, the instruction issues and sets 
that SM register to a 1. If the value is 1, the instruction holds issue 
until the value is 0. 

When all CPUs in a cluster are holding issue on a test and set 
instruction, a deadlock interrupt can occur. All CPUs with equal cluster 
numbers above belong to the same cluster and must be holding issue on a 
test and set instruction to cause a deadlock interrupt. When that 
happens, all CPUs in the cluster receive deadlock interrupts. If only 
one CPU belongs to a cluster and holds issue on a test and set 
instruction, that CPU receives a deadlock interrupt. No deadlock 
interrupt can occur in cluster (CLN=0). 

When an interrupt occurs, normally the instructions already in the Next 
Instruction Parcel (NIP) and Current Instruction Parcel (CIP) registers 
are allowed to issue before the exchange sequence starts. If a test and 
set instruction is holding in the CIP register and an interrupt occurs, a 
special exchange start-up sequence is initiated. Here, the instruction 
in the NIP register and the test and set instruction in the CIP register 
are discarded and the Program Counter (P) register is adjusted to point 
to the discarded test and set instruction. The Waiting on Semaphore (WS) 
flag in the Exchange Package sets, indicating a test and set instruction 
was holding in the CIP register when the interrupt occurred. The 
exchange sequence is then started. 

Instructions used with the SM registers are: 

Octal Code CAL Syntax Description 

0034 jk SMjk 1,TS Test and set, SMjfc 

0036J& SMjfc Clear SMjfc 
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Octal Code CAL Syntax Description 

0037J* SMjfc 1 Set SUjk 

072i02 Si SM Transmit (SM) to Si 

073i02 SM Si Transmit (Si) to SM 



Shared register and semaphore conflicts 

A scanner is used to break a tie caused by simultaneous requests for 
access to the Semaphores or Shared registers of any cluster. If there is 
no competition for access, no extra hold issues are generated. For 
example, an 027ij*7 holds issue 3 CP, but if there is an access conflict, 
issue holds until a scanner with four slots breaks the tie. A request 
takes 2 CPs to complete; therefore, subsequent requests can be accepted 
every other CP until all requests are resolved. 



CPU INPUT/OUTPUT SECTION 

The I/O section of the mainframe is shared by all CPUs. The mainframe 
supports three channel types identified by their maximum transfer rates 
of 1250 Mbytes per second, 100 Mbytes per second, and 6 Mbytes per second, 

Two 1250 Mbyte per second channel pairs transfer data between Central 
Memory and an SSD. These channels are 128 bits wide and use 16 check 
bits in each direction. A maximum transfer rate of more than 10 gigabits 
per second is possible on a 12 50 Mbyte per second channel. The channel 
is two parallel 64-bit channels each with SECDED; therefore, under 
certain circumstances the full-width channel can correct double errors. 

Four 100 Mbyte per second channel pairs transfer data between Central 
Memory and an I/O Subsystem. A 100 Mbyte per second channel is 64 bits 
wide and uses 8 check bits in each direction. Data words are transferred 
in blocks of 16 under the control of Data Ready and Data Transmit control 
signals. Each 100 Mbyte per second channel has a maximum transfer rate 
of approximately 850 Mbits per second. 

IOS communication with the CPUs is over four pairs of control channels, 
each with a maximum transfer rate of 6 Mbytes per second. Each 6 Mbyte 
per second channel is 16 bits wide. 

All I/O (including 100 Mbyte and 12 50 Mbyte per second channels) uses the 
I/O ports to memory. A scanner controls access to these ports. All CPU 
memory ports (Ports A, B, and C) have higher priority than the I/O ports. 
Channel features of the I/O section are summarized below and described in 
the remainder of this section. 
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• Two channel pairs with a 12 50 Mbytes per second maximum 
transfer rate per channel; 128 data bits and 16 check bits in 
each direction. 

• Four channel pairs with a 100 Mbytes per second maximum 
transfer rate per channel; 64 data bits, 3 control bits, and 
8 check bits in each direction. 

• Four I/O channel pairs with a 6 Mbytes per second maximum 
transfer rate per channel 

Shared control from the CPUs 

16 data bits, 3 control bits, and 4 parity bits in each 
direction 

Lost data detection 

• Channels are divided into four groups; each group contains 
either input or output channels. 

• Channel groups are served equally by memory (each group is 
scanned every 4 CPs). 

• Channel priority is resolved within channel groups. 



DATA TRANSFER FOR SOLID-STATE STORAGE DEVICE 

Data is transferred directly between the SSD and the mainframe using the 
1250 Mbyte per second channels. A 1250 Mbyte per second channel is 
128 bits wide and is programmed through software. The SSD Solid-state 
Storage Device Hardware Reference Manual contains programming details for 
the SSD. 



DATA TRANSFER FOR I/O SUBSYSTEM 

A 100 Mbyte per second channel pair transfers data between Central Memory 
and the BIOP of the IOS. A second 100 Mbyte per second channel pair 
transfers data between Central Memory and a DIOP or XIOP.T Each 
channel is 64 bits wide and handles data at approximately 100 Mbytes/s. 
Each channel uses an additional 8 check bits for SECDED, as is used in 
Central Memory. 



f Software does not currently support data transfer using the 100 Mbyte 
per second channel pair to an XIOP. 
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The CPU side of a 100 Mbyte per second channel pair uses a pair of 
16-word buffers to stream the data out of Central Memory and another pair 
to stream data into Central Memory. On output, as one buffer block is 
being sent to the IOP, the other buffer is filling from Central Memory. 
Similarly, on input, one buffer block is filling from an IOP while the 
other is transmitting to Central Memory. 

At the IOP side of a 100 Mbyte per second channel pair, data passing into 
Local Memory (an IOP's memory) is double buffered and disassembled into 
16-bit parcels. The channel side passing data from Local Memory simply 
assembles 16-bit parcels into 64-bit words for transmission to a CPU. 

An IOP controls a 100 Mbyte per second channel pair linking it with 
Central Memory. The IOP initiates all data transfers on the channel and 
performs all error processing required for the channel. There are no CPU 
instructions for the 100 Mbyte per second channel pair. The I/O 
Subsystem Reference Manual contains programming details for the 100 Mbyte 
per second channels. 



6 MBYTE PER SECOND CHANNELS 

Standard control channels for the system are 6 Mbyte per second 
channels. Each 6 Mbyte per second channel has 16-bit asynchronous 
control logic used for front-end interfaces. The instructions used with 
6 Mbyte per second channels follow. 

Octal Code CAL Syntax Description 

0010 jk CA,Aj A* Set the Current Address (CA) register 

for the channel indicated by (Aj) to 
(hk) and activate the channel 

0011 jk CL,Aj A* Set the channel Limit Address (CL) 

register for the channel indicated by 
(Aj) to (A/c) 

0012 jk CI,Aj Clear the Interrupt flag and Error flag 

for the channel indicated by (Aj): 
Output channel k=0; clear MC, k=l; set 
MC. Input channel k=0; no operation, 
k=l; clear held ready. 

033i00 Ai CI Transmit channel number to hi 

033ij0 Ai CA,Aj Transmit address of channel (Aj) to 

Ai 

033ijl Ai CE,Aj Transmit Error flag of channel (Aj) 

to Ai 
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MULTI-CPU PROGRAMMING 

The 6 Mbyte per second I/O channels can operate from any CPU, and any CPU 
can issue instructions to any of the channels. There is no hardware 
interlock among the CPUs; therefore, software must ensure that only one 
CPU is servicing I/O at a time while in monitor mode. Instruction 033 is 
independent in nature and can be issued without an interlock. 

The following conditions must be met for an I/O interrupt to occur: 

• No CPU waiting for an exchange 

• No CPU in monitor mode 

• An interrupt is present 

Normally, the interrupt from a 6 Mbyte per second channel is directed 
toward the CPU that last issued a clear interrupt instruction (0012) to 
that channel. Because an I/O interrupt occurs, however, in only one CPU 
at a time, the following conditions (in priority order) determine the CPU 
toward which the interrupt is directed. Once in monitor mode, a CPU 
should service all I/O interrupts. 

1. All I/O interrupts are directed toward a CPU that has the select 
external interrupt mode set. 

2. If no CPU has selected external interrupts, then interrupts are 
directed toward a CPU holding issue on a test and set instruction. 

3. If neither conditions 1 nor 2 exist or if they exist in all CPUs, 
the interrupt is directed to the CPU that last issued a clear 
interrupt instruction to that channel. 



6 MBYTE PER SECOND CHANNEL OPERATION 

Input and output channels access Central Memory directly. Input channels 
store external data in memory and output channels read data from memory. 
A primary task of a channel is to convert 64-bit Central Memory words 
into 16-bit parcels or 16-bit parcels into 64-bit Central Memory words. 
Four parcels make up one Central Memory word with bits of the parcels 
assigned to memory bit positions as shown in table 2-2. In both input 
and output operations, parcel is always transferred first. 

Each input or output channel has a data channel (4 parity bits, 16 data 
bits, and 3 control lines), a 64-bit assembly or disassembly register, a 
channel CA register, and a channel CL register. 

Three control signals (Ready, Resume, and Disconnect) coordinate the 
transfer of parcels over the channels. In addition to the three control 
signals, the output channel of a pair has a Master Clear line. 
Appendix B describes the signal sequence of a 6 Mbyte per second channel. 
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I/O interrupts can be caused by the following: 

• On all output channels, if (CA) becomes equal to (CL), then the 
resume for the last parcel transmitted sets interrupt. 

• External device disconnect is received on any input channel and 
channel is active. 

• Channel error condition occurs (described later in this 
section) . 



Table 2-2. Channel Word Assembly/Disassembly 







Number 




Characteristic 


Bit Position 


of Bits 


Comment 


Channel data bits 


2 15_ 2 


16 


Four 4-bit groups 


Channel parity bits 




4 


One per 4-bit group 


CRAY X-MP word 


2 63_ 2 


64 




Parcel 


2 63_ 2 48 


16 


First in or out 


Parcel 1 


2 47_ 2 32 


16 


Second in or out 


Parcel 2 


2 31_ 2 16 


16 


Third in or out 


Parcel 3 


2 15_ 2 


16 


Fourth in or out 



The number of the channel causing an interrupt can be determined by 
using instruction 033, which reads into Ai the highest priority 
channel number requesting an interrupt. The lowest numbered channel 
has the highest priority. The interrupt request continues until 
cleared by the monitor program when an interrupt from the next highest 
priority channel, if present, is sensed. All interrupts are available 
through instruction 033 to all CPUs. Channel numbers for 6 Mbyte per 
second channels are 10 8 through 17 8 (10/11, 12/13, 14/15, and 
16/17; even for input, odd for output). 



INPUT CHANNEL PROGRAMMING 

To start an input operation, the CPU program (refer to figure 2-5): 

1. Sets the channel CL to the last word address (LWA) + 1 (LWA+1) 

2. Sets the channel CA to the first word address (FWA) 
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Setting the current address causes the Channel Active flag to set. The 
channel is then ready to receive data. When a 4-parcel word is assembled, 
the word is stored in memory at the address contained in the CA register. 
When the word is accepted by memory, the current address is advanced by 1. 

An external transmitting device sends a Disconnect signal to indicate end 
of a transfer. When the Disconnect signal is received, the Channel 
Interrupt flag sets and a test is performed to check for a partially 
assembled word. If the partial word is found, the valid portion of the 
word is stored in memory and the unreceived, low-order parcels are stored 
as zeros. 

The Interrupt flag sets when a Disconnect signal is received or when the 
Channel Error flag is set. 



Clear 

Interrupt 

Flag 



T 



Continue 



I Abort J* 



Begin 



J 



Set 

Channel 

Limit 



I 



Set 

Channel Address 

(Channel is activated) 



Data is 
Receive 



Transferred 
Interrupt 



Get 

Channel 

Interrupt No. 




Determine 

Number of Words 

Transferred 




Clear 

Int . Error 

Flags 



/JZo 



Figure 2-5. Basic I/O Program Flowchart 
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INPUT CHANNEL ERROR CONDITIONS 

Input channel error conditions can occur at a parcel level (parity 
error). When a parcel in error occurs, the Parity Fault flag sets 
immediately. The Parity Fault flag does not generate an interrupt, it is 
saved and sets the Error flag when a disconnect occurs or if CA = CL. 
Therefore, the program should check the state of the Error flag when an 
interrupt is honored. All parcels stored after the error are zeroed. 

If a Ready signal is received when the channel is not active, the Ready 
condition is held until the channel is activated. At this time, a Resume 
signal is sent. No Error flag is set and no interrupt request is 
generated. Since the Ready condition is held when the channel is 
inactive, it is sometimes advantageous to be able to clear this Ready 
signal before setting up the channel, especially on a deadstart or a 
resynchronization of the channel after an error. The Ready signal can be 
cleared by using instruction 0012jl to input channel (Ai), clearing 
any Ready signal being held before issue of instruction 0012 jl. 



OUTPUT CHANNEL PROGRAMMING 

To start an output operation, the CPU program: 

1. Sets the channel limit address to the LWA + 1 

2. Sets the channel current address to the first word address (FWA) 

Setting the current address causes the Channel Active flag to be set. 
The channel reads the first word from memory addressed by the contents of 
the CA register. When the word is received from memory, the channel 
advances the current address by 1 and starts the data transfer. 

After each word is read from memory and the current address is advanced, 
the limit test is made, comparing the contents of the CA register and the 
CL register. If they are equal, the operation is complete as soon as the 
last parcel transfer is finished. 

The Interrupt flag also sets if an error is detected. There are two 
errors that an output channel detects; a Resume signal received when the 
channel is inactive and a Resume signal received while a Read Reference 
Request is present. No external response is generated. 



PROGRAMMED MASTER CLEAR TO EXTERNAL DEVICE 

The system can send a Master Clear signal to an external device through 
the output channel. The external Master Clear sequence is as follows: 
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Step 


Octal Code 


1. 


0012jk 


2. 


0012J1 


3. 


Delay 1 


4. 


0012J0 


5. 


Delay 2 



Description 

Clears input channel to ensure external 
activity on the channel pair has stopped 

Clears output channel to ensure CPU activity on 
the channel pair has stopped; set Master Clear. 

Device dependent; determines the duration of 
the Master Clear signal. 

Clears the output channel; this turns off the 
Master Clear signal. 

Device dependent; allows time for 
initialization activities in the attached 
device to complete. 



For CRI front-end interfaces, delays 1 and 2 should each be a minimum of 
80 CPs. 



ACCESS TO CENTRAL MEMORY 

Each CPU has one I/O port to memory. Channels are divided into four 
groups and scanned to allow access to memory. Each of the four channel 
groups shown below is assigned a time slot (figure 2-6) that is scanned 
for a memory request once every 4 CPs. The channel listed first in each 
group has the highest priority. During the next 3 CPs, the scanner 
allows requests from the other three channel groups. Therefore, an I/O 
memory request can occur every CP. The scanner stops for all memory 
conflicts caused by an I/O reference and also stops for a block (100 Mbyte 
per second channel) reference while a buffer is referencing, maximum 16 
words (figure 2-7). 

Channels A, B, C, and D are 100 Mbyte per second channels. Channels 6 and 
7 are 1250 Mbyte per second channels. Channels 10 through 17 are 6 Mbyte 
per second channels. 

CPU 



Group 





1 


2 


3 


(input) 


A, 10 


B, 14 


C 


D 


1 (output) 


A, 11 


B, 15 


C 


D 


2 (input) 


7, 12 


7, 16 


6 


6 


3 (output) 


7, 13 


7, 17 


6 


6 
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REFERENCE CONTROL 
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Figure 2-6. Channel I/O Control 
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Figure 2-7. Input/Output Datapaths 
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I/O LOCKOUT 

An I/O memory request can be locked out by an exchange sequence or 
instruction fetch sequence. 



MEMORY BANK CONFLICTS 

Memory bank conflicts are tested for CPU scalar, vector, and I/O 
memory references. When an exchange sequence or instruction fetch 
sequence is in progress, all other memory references for the CPU pair 
are locked out. 

Each memory bank can accept a new request every 4 CPs. To test for a 
memory bank conflict, the 6 low-order bits of the memory address are 
checked against Bank Busy conflicts and other memory references. The 
bank is busy for 4 CPs on a reference. 



I/O MEMORY CONFLICTS 

Before testing for a memory bank conflict, a check is made to ensure 
no exchange sequence or instruction fetch sequence is in progress. If 
either of these conditions exists, the I/O request is held. The 6 
low-order address bits are tested against Bank Busy conflicts and 
other memory references. If a bank being referenced is busy, the 
reference is held and the scanner is stopped. 



I/O MEMORY REQUEST CONDITIONS 

The following conditions must be present for an I/O memory request to 
be processed: 

• I/O request 

• Bank not busy 

• No simultaneous conflicts with other memory ports 

• No fetch request within the CPU pair 

• No exchange sequence within the CPU pair 



HR-0097 2-23 



I/O MEMORY ADDRESSING 

All I/O memory references are absolute. The CA and CL registers are 
24 bits, allowing I/O access to all of memory. Setting of the CA and 
CL registers is limited to monitor mode. I/O memory reference 
addresses are not checked for range errors. 
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CPU CONTROL SECTION 



All CPUs have identical, independent control sections containing 
registers and instruction buffers for instruction issue and control. A 
control section uses an exchange mechanism for switching instruction 
execution from program to program. This section describes these 
registers and buffers and the exchange mechanism. Memory field 
protection, programmable clock, and deadstart sequence are also described. 



INSTRUCTION ISSUE AND CONTROL 

The following paragraphs describe the registers and instruction buffers 
involved with instruction issue and control. Figure 3-1 illustrates the 
general flow of instruction parcels through the registers and buffers. 



00 



37 



P K 



NIP 



CIP 



LIP 



Instruction Buffers 



Issue 



io<*2. 



Figure 3-1. Instruction Issue and Control Elements 
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PROGRAM ADDRESS REGISTER 

The 24-bit Program Address (P) register indicates the next parcel of 
program code to enter the Next Instruction Parcel (NIP) register. The 
high-order 22 bits of the P register indicate the word address for the 
program word in memory relative to the base address. (Program size is 
limited to 4 million words.) The low-order 2 bits indicate the parcel 
within the word. Except on a branch instruction when the branch is taken 
or on an exchange, the P register contents are advanced 1 when an 
instruction parcel enters the NIP register. 

New data enters the P register on an instruction branch or on an exchange 
sequence. (The exchange sequence is described under Exchange Mechanism 
later in this section. ) The contents of P are then advanced sequentially 
until the next branch or exchange sequence. The value in the P register 
is stored directly into the terminating Exchange Package during an 
exchange sequence . 

The P register is not master cleared. The value stored in P might not be 
accurate during the deadstart sequence. 



NEXT INSTRUCTION PARCEL REGISTER 

The 16-bit NIP register holds a parcel of program code before it enters 
the Current Instruction Parcel (CIP) register. 

The NIP register is not master cleared. An undetermined instruction can 
issue during the master clear interval before the interrupt condition 
blocks data entry into the NIP register. 



CURRENT INSTRUCTION PARCEL REGISTER 

The 16-bit CIP register holds the instruction waiting to issue. The term 
issue indicates the transition of an instruction in CIP to its execution 
phase. If an instruction is a 2-parcel instruction, the CIP register 
holds the first parcel of the instruction and the Lower Instruction 
Parcel (LIP) register holds the second parcel. Issue of an instruction 
in CIP can be delayed until conflicting operations have been completed. 
Data arrives at the CIP register from the NIP register. Indicators 
making up the instruction are distributed to all modules that have mode 
selection requirements when the instruction issues. 

The control flags associated with the CIP register are master cleared; 
the register itself is not. An undetermined instruction can issue during 
the master clear sequence. 
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LOWER INSTRUCTION PARCEL REGISTER 

The 16-bit LIP register holds the second parcel of a 2-parcel instruction 
at the time the first parcel of the 2-parcel instruction is in the CIP 
register. 



INSTRUCTION BUFFERS 

A CPU has four instruction buffers; each of which can hold 128 
consecutive 16-bit instruction parcels (figure 3-2). Instruction parcels 
are held in the buffers before being delivered to the NIP or LIP 
registers . 
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Figure 3-2. Instruction Buffers 



The beginning instruction parcel in a buffer always has a worcL address 
that is a multiple of 40g (a parcel address that is a multiple of 200 8 ) 
allowing the entire range of addresses for instructions in a buffer to be 
defined by the high-order 17 bits of the parcel address. Each buffer has 
a 17-bit Beginning Address register (IBAR) containing this value. 
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The Beginning Address registers are scanned each CP. If the high-order 
17 bits of the P register match one of the beginning addresses, an 
in-buffer condition exists and the proper instruction parcel is selected 
from that instruction buffer. An instruction parcel to be executed 
normally is sent to the NIP. The second parcel of a 2-parcel instruction 
is blocked/ however, from entering the NIP register and is sent to the 
LIP register instead. The second parcel of the 2-parcel instruction 
becomes available when the first parcel issues from the CIP register. 
Simultaneously, an all-zero parcel is entered into the NIP register. 

On an in-buffer condition, if the instruction is in a different buffer 
than the previous instruction, a change of buffers occurs requiring a 
2-CP delay of the instruction reaching the NIP register. 

An out-of-buf fer condition exists when the high-order 17 bits of the P 
register do not match any instruction buffer beginning address. When 
this condition occurs, instructions must be loaded from memory into one 
of the instruction buffers before execution can continue. A 2-bit 
counter determines the instruction buffer receiving the instructions. 
Each out-of-buf fer condition causes the counter to be incremented by 1 so 
that the buffers are selected in rotation. 

Buffers are loaded from memory at the rate of 8 words per CP. The first 
group of 32 parcels delivered to the buffer always contains the next 
instruction required for execution. For this reason, the branch 
out-of-buf fer time is 16 CPs for 64-bank memories, providing memory is 
not busy (if busy, the branch fetch is delayed until the busy is 
resolved). Once the fetch proceeds, the remaining groups arrive at a 
rate of 32 parcels per CP and circularly fill the buffer. 

An exchange sequence voids the instruction buffers, preventing a match 
with the P register and causing the buffers to be loaded as needed. 

Forward and backward branching is possible within buffers. Branching 
does not cause reloading of an instruction buffer if the address of the 
instruction being branched to is within one of the buffers. Multiple 
copies of instruction parcels cannot occur in the instruction buffers. 
Because instructions are held in instruction buffers before issue and 
after (until the buffer is reloaded), self-modifying code should not be 
used. Also, because of independent data and instruction memory 
protection, self -modifying code may be impossible. As long as the 
address of the unmodified instruction is in an instruction buffer, the 
modified instruction in memory is not loaded into an instruction buffer. 

Although optimizing code segment lengths for instruction buffers is not a 
prime consideration when programming a CPU, the number and size of the 
buffers and the capability for forward and backward branching can be used 
to good advantage. Large loops containing up to 512 consecutive 
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instruction parcels can be maintained in the four buffers. An 
alternative is for a main program sequence in one or two of the buffers 
to make repeated calls to short subroutines maintained in the other 
buffers. The program and subroutines remain undisturbed in the buffers 
as long as no out-of-buf fer condition or exchange causes reloading of a 
buffer. 



EXCHANGE MECHANISM 

A CPU uses an exchange mechanism for switching instruction execution from 
program to program. This exchange mechanism involves the use of blocks 
of program parameters known as Exchange Packages and a CPU operation 
referred to as an exchange sequence. For the convenience of Cray- 
Assembly Language (CAL) programmers, an alternate bit position 
representation is used when discussing the Exchange Package. The bits 
are numbered from left to right with bit assigned to the 2*>3 D it 
position. 



EXCHANGE PACKAGE 

The Exchange Package is a 16-word block of data in memory associated with 
a particular computer program. The Exchange Package contains the basic 
parameters necessary to provide continuity from one execution interval 
for the program to the next. 

The Exchange Package contents are arranged in a 16-word block. The 
exchange sequence swaps data from memory to the operating registers and 
back to memory. This sequence exchanges data in an active Exchange 
Package residing in the operating registers with an inactive Exchange 
Package in memory. The Exchange Address (XA) register address of the 
active Exchange Package specifies the memory address to be used for the 
swap. Data is exchanged and a new program execution interval is 
initiated by the exchange sequence. 

The contents of the B, T, V, VM, SB, ST, and SM registers are not swapped 
in the exchange sequence. Data in these registers must be stored and 
replaced as required by specific coding in the program supervising the 
object program execution or by any program that needs this data. (Refer 
to section 4 for descriptions of the operating registers and the VL 
register. ) 
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Figure 3-3. Exchange Package for a Four-processor System 



Field 


Word 


Bits 




Description 


PN 







0-1 




Processor number 


E 







2-3 




Error type 


S 







4-11 




Syndrome bits 


P 







16-39 




Program Address register 


R 


1 




0-1 




Read mode 


CSB 


1 




2-5 
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Read address 




1 
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Field 



Word 



Bits 



VNU 


2 





ESVL 


3 





F 


3 


14-15; 
31-39 


XA 


3 


16-23 


VL 


3 


24-30 


EAM 


4 





DBA 


4 


16-33 


PS 


4 


35 


CLN 


4 


37-39 


DLA 


5 


16-33 




0-7 


40-63 




8-15 


0-63 



Description 

Vector not used 

Enable Second Vector Logical 

Flag register 

Exchange Address register 
Vector Length register 
Enhanced Addressing Mode 
Data Base Address 
Program State 
Cluster Number 
Data Limit Address 
Eight A register contents 
Eight S register contents 



Processor number 

The contents of the 2-bit PN position in the Exchange Package indicates in 
which CPU the Exchange Package executed. This value is not read into the 
CPU; it is a constant inserted only into a package being stored. 



Memory error data 

Bit 36 (interrupt on correctable memory error bit) and bit 38 (interrupt 
on uncorrectable memory error bit) in the M register determine if memory 
error data is included in the Exchange Package. Error data, consisting of 
four fields of information, appears in the Exchange Package if bit 36 is 
set and correctable memory error is encountered or if bit 38 is set and an 
uncorrectable memory error is detected.* 



For multiple bit memory errors, the hardware always sets the 
correctable Memory Error flag in the interrupted Exchange Package. 
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Memory error data fields are described below. 

Field Description 

Error type (E) The type of memory error encountered, uncorrectable 
or correctable, is indicated in word 0, bits 2 and 
3 of the Exchange Package. Bit 2 is set for an 
uncorrectable memory error; bit 3 is set for a 
correctable memory error. 

Syndrome (S) The 8 S bits used in detecting a memory data error 

are returned in word 0, bits 4 through 11 of the 
Exchange Package. Refer to section 2 for 
additional information. 

Read mode (R) Indicates the read mode in progress when a memory 

data error occurred and is in word 1, bits and 1 
of the Exchange Package. These bits assume the 
following values: 

00 I/O 

01 Scalar (memory references with A or S) 

10 Vector, B, or T 

11 Instruction fetch or exchange 

Read address The 10-bit CSB field contains the address where a 
(CSB) memory data error has occurred. Word 1, bits 6 

through 11 of the Exchange Package contain bits 
2 5 through 2^ of the address and can be 
considered the bank (B) address (refer to figure 
3-4). In the 16K chip, word 1, bits 2 through 5, 
of the Exchange Package contain bits 2 22 through 
2^0 of the address and can be considered the chip 
select (CS). In the 64K chip, the chip select is 
contained in bits 2 23 and 2 22 . 



Exchange Package Bits 
2 2 23 2 4 25 2 6 2 7 2 8 2$ 2 10 2 11 
2 23 2 22 2 21 2 2 ° 2 5 2 4 2 1 2° 2 3 2 2 



Chip Select Chip Select Bank Address Bits 

for 64K for 16K 



Figure 3-4. Read Address (CSB) Bits (64 Banks) 
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Program Address register 

The P register contents (address of first program instruction not yet 
issued) are stored in bits 16 through 39 of word (maximum program size 
is 4 million words). The instruction at this location is the first 
instruction to be issued when this program begins again. 



Memory field registers 

Each object program has a designated field of memory for instructions and 
data that is specified by the monitor program when the object program is 
loaded and initiated. All memory addresses contained in the object 
program code are relative to one of two base addresses specifying the 
beginning of the appropriate field and are limited in size. Each object 
program reference to memory is checked against the limit and base 
addresses to determine if the address is within the bounds assigned. 
These field limits are contained in four registers that are saved in the 
Exchange Package. The four registers are: the Instruction Base Address 
(IBA) register, the Instruction Limit Address (ILA) register, the Data 
Base Address (DBA) register, and the Data Limit Address (DLA) register. 
Refer to the subsection on Memory Field Protection later in this section 
for an explanation of the registers. 



Mode register 

The 10-bit M register contains part of the Exchange Package for a 
currently active program. The M register bits are assigned in words 1 
and 2 of the Exchange Package as follows: 

Word 1 

Bit Description 

3 5 Waiting for Semaphore (WS) flag; when set, the CPU exchanged 
when a test and set instruction was holding in the CIP 
register. 

36 Floating-point Error Status (FPS) flag; when set, a 
floating-point error has occurred regardless of the state of 
the Floating-point Error Mode flag. 

37 Bidirectional Memory Mode (BDM) flag; when set, block reads 
and writes can operate concurrently. 

38 Selected for External Interrupts (SEI) flag; when set, this 
CPU is preferred for I/O interrupts. 

39 Interrupt Monitor Mode (IMM) flag; when set, it enables all 
interrupts in monitor mode except PC, MCU, I/O, and ICP. 
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Word 2 

Bit Description 

35 Operand Range Error Mode (IOR) flag; when set, it enables 
interrupts on operand address range errors. 

36 Correctable Memory Error Mode (ICM) flag; when set, it 
enables interrupts on correctable memory data errors. 

37 Floating-point Error Mode (IFP) flag; when set, it enables 
interrupts on floating-point errors. 

38 Uncorrectable Memory Error Mode (IUM) flag; when set, it 
enables interrupts on uncorrectable memory data errors. 

39 Monitor Mode (MM) flag; when set, it inhibits all interrupts 
except memory errors, normal, and error exit. 

The 10 bits are set selectively during an exchange sequence. 

Word 1, bit 37, (Bidirectional Memory Mode flag) can be set or cleared by 
using instructions 0026 (enable bidirectional Memory transfers) and 0025 
(disable bidirectional Memory transfers). 

Word 2, bit 35, (Operand Range Error Mode flag) can be set or cleared 
during the execution interval of a program by using instructions 0023 
(enable interrupt on operand address range error) and 0024 (disable 
interrupt on operand address range error). 

Word 2, bit 37, (Floating-point Error Mode flag) can be set or cleared 
during the execution interval for a program by using instructions 0021 
(enable interrupt on floating-point error) and 0022 (disable interrupt on 
floating-point error). 

Word 1, bits 36 and 37, and word 2, bits 35 and 37, can be read with 
instruction 073i01. Word 1, bits 35 and 36 indicate the state of the CPU 
at the time of the exchange. The remaining bits are not altered during 
the execution interval for the Exchange Package and can be altered only 
when the Exchange Package is inactive in storage. 



Vector not used (VNU) 

The state of the VNU position in the Exchange Package indicates whether 
or not instructions 076, 077, or 140 through 177 were issued during the 
execution interval. If none of the instructions were issued, the bit 
remains set. If one or more of the instructions was issued, the bit is 
cleared. Once cleared, the bit remains clear until reset through a 
memory store to the dormant Exchange Package. 
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Enable Second Vector Logical (ESVL) 

The state of the ESVL position in the Exchange Package indicates whether 
or not the Second Vector Logical unit can be used. If set, instructions 
140 through 145 may select the Second Vector Logical unit. If clear, the 
Second Vector Logical unit cannot be used; only the Full Vector Logical 
unit may be used. 



Flag register 

The 11-bit F register contains part of the Exchange Package for the 
currently active program. This register is located in word 3 and 
contains 11 flags individually identified within the Exchange Package. 
Setting any of these flags interrupts program execution. When one or 
more flags are set, a Request Interrupt signal is sent to initiate an 
exchange sequence. The F register contents are stored along with the 
rest of the Exchange Package. The monitor program can analyze the 11 
flags for the cause of the interruption. Before the monitor program 
exchanges back to the package, it must clear the flags in the F register 
area of the package. If any bit remains set, another exchange occurs 
immediately. 

The F register bits are assigned in word 3 of the Exchange Package as 
follows: 

Word 3 

Bit Description 

14 Interrupt From Internal CPU (ICP) flag; set when the another 
CPU issues instruction 001401. 

15 Deadlock (DL) flag; set when all CPUs in a cluster are 
holding issue on a test and set instruction. 

31 Programmable Clock Interrupt (PCI) flag; set when the 
interrupt countdown counter in the programmable clock equals 
0. The programmable clock is explained later in this section, 

32 MCU Interrupt (MCU) flag; set when the MIOP sends this signal, 

33 Floating-point Error (FPE) flag; set when a floating-point 
range error occurs in any of the floating-point functional 
units and the Enable Floating-point Interrupt flag is set. 
Section 4, Computation, explains floating-point functional 
units. 
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Word 3 (continued) 

Word Description 

34 Operand Range Error (ORE) flag; set when a data reference is 
made outside the boundaries of the Data Base Address (DBA) 
and Data Limit Address (DLA) registers and the Enable Operand 
Range Interrupt flag is set. Operand range error is 
explained later in this section. 

35 Program Range Error (PRE) flag; set when an instruction fetch 
is made outside the boundaries of the Instruction Base 
Address (IBA) and Instruction Limit Address (ILA) registers. 
Program range error is explained later in this section. 

36 Memory Error (ME) flag; set when a correctable or 
uncorrectable memory error occurs and the corresponding 
enable memory error mode bit is set in the M register. 

37 I/O Interrupt (IOI) flag; set when a 6 Mbyte channel or the 
1250 Mbyte channel completes a transfer. 

38 Error Exit (EEX) flag; if not in MM, set by an error exit 
instruction (000). 

39 Normal Exit (NEX) flag; if not in MM and IMM, set by a normal 
exit instruction (004). 

Any flag (except the ME flag) can be set in the F register only if the 
active Exchange Package is not in monitor mode. Such flags are set only 
if word 2, bit 39, of the M register is 0. Except for the ME flag, if 
the program is in monitor mode and the conditions for setting an F 
register are present, the flag remains cleared and no exchange sequence 
is initiated. 



Exchange Address register 

The 8-bit XA register specifies the first word address (FWA) of a 16-word 
Exchange Package loaded by an exchange operation. The register contains 
the high-order 8 bits of a 12-bit field specifying the address. The 
low-order bits of the field are always 0; an Exchange Package must begin 
on a 16-word boundary. The 12-bit limit requires that the absolute 
address be in the lower 4096 (ICOOOg) words of memory. 

When an execution interval terminates, the exchange sequence exchanges 
the contents of the registers with the contents of the Exchange Package 
at the beginning address (XA) in memory. 
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Enhanced Addressing Mode (EAM) 

The state of the EAM position in the Exchange Package indicates whether 
or not address extension occurs for address calculations. If set, 
instructions 100 through 137 will sign-extend the 22-bit value (jkm) to 
24 bits for address calculations (compatible with an 8-million- or 
16-million-word system). If clear, instructions 100 through 137 
(not I/O) have address bits 2 22 and 2 23 replaced by database address 
bits 2 22 and 2 23 . 



Data Base Address register 

Refer to the memory field register subsection for register explanation. 

Program State register 

The state of the 1-bit Program State (PS) register is manipulated by the 
operating system to represent different program states in the CPUs 
concurrently processing a single program. 

Cluster Number register 

The 3-bit Cluster Number (CLN) register determines the CPU's cluster. 
The CLN register contents are used to determine which set of SB, ST, and 
SM registers the CPU can access. If the CLN register is 0, then the CPU 
does not have access to SB, ST, or SM registers. The CLN registers 
contents in all CPUs are also used to determine the condition necessary 
for a deadlock interrupt. 

Data Limit Address registers 

Refer to the Memory field register subsection for register explanation. 

A registers 

The current contents of all A registers are stored in bits 40 through 63 
of words through 7 during exchange. 

S registers 

The current contents of all S registers are stored in bits through 63 
of words 8 through 15 during exchange. 
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ACTIVE EXCHANGE PACKAGE 

An active Exchange Package resides in the operating registers. The 
interval of time when the Exchange Package and the program associated 
with it are active is called the execution interval. An execution 
interval begins with an exchange sequence where the subject Exchange 
Package moves from memory to the operating registers. An execution 
interval ends as the Exchange Package moves back to memory in a 
subsequent exchange sequence. 



EXCHANGE SEQUENCE 

The exchange sequence is the vehicle for moving an inactive Exchange 
Package from memory into the operating registers. Simultaneously, the 
exchange sequence moves the currently active Exchange Package from the 
operating registers back into memory. This swapping operation is done in 
a fixed sequence when all computational activity associated with the 
currently active Exchange Package has stopped. The same 16 -word block of 
memory is used as the source of the inactive Exchange Package and the 
destination of the currently active Exchange Package. Location of this 
block is specified by the XA register contents and is a part of the 
currently active Exchange Package. The exchange sequence can be 
initiated by deadstart sequence, Interrupt flag set, or program exit. 



Exchange initiated by deadstart sequence 

The deadstart sequence forces the XA register contents to for all CPUs 
and also forces an interrupt in one CPU. These two actions cause an 
exchange using memory address as the location of the Exchange Package. 
The inactive Exchange Package at address then moves into the operating 
registers and initiates a program using these parameters. The Exchange 
Package swapped to address is largely indeterminate because of the 
deadstart operation. New data entered at these storage addresses then 
discards the old Exchange Package in preparation for starting subsequent 
CPUs with an interprocessor interrupt. 

When instruction 0014J1 (IP) is issued in the first CPU, the CPU 
associated with processor number j exchanges to address in memory. 
(A set of switches on the mainframe's control panel associates processor 
number with CPU number and selects which CPU is deadstarted first.) 



Exchange initiated by Interrupt flag set 

An exchange sequence can be initiated by setting any one of the Interrupt 
flags in the F register. Setting of one or more flags causes a Request 
Interrupt signal to initiate an exchange sequence. 
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Exchange initiated by program exit 

Two program exit instructions initiate an exchange sequence. Timing of 
the instruction execution is the same in either case, the difference is 
determined by which of the two flags is set in the F register. The two 
instructions are: 

Octal Code CAL Syntax Description 

000 ERR Error exit 

004 EX Normal exit 

The two exits enable a program to request its own termination. A 
nonmonitor (object) program usually uses the normal exit instruction to 
exchange back to the monitor program. The error exit allows for abnormal 
termination of an object program. The exchange address selected is the 
same as for a normal exit. 

Each instruction has a flag in the F register. The appropriate flag is 
set if the currently active Exchange Package is not in monitor mode. The 
inactive Exchange Package called in this case is normally one that 
executes in monitor mode. Flags are checked for evaluation of the 
program termination cause. 

The monitor program selects an inactive Exchange Package for activation 
by setting the address of the inactive Exchange Package in the XA 
register and then executing a normal exit instruction. 



Exchange sequence issue conditions 

The following are hold issue conditions, execution time, and special 
cases for an exchange sequence. 

Hold conditions: 

• NIP register contains a valid instruction 

• S, V, or A registers busy 

Execution time: 

For 64 banks, 40 CPs; consists of an exchange sequence (24 CPs) and a 
fetch operation (16 CPs). 

Special cases: 

If a test and set instruction is holding in the CIP register, both 
CIP and NIP registers are cleared and the exchange occurs with the WS 
flag set and the P register pointing to the test and set instruction. 
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EXCHANGE PACKAGE MANAGEMENT 

Each 16-word Exchange Package resides in an area defined during system 
deadstart. The defined area must lie within the lower 4096 (IO/OOO3) 
words of memory. The package at address is the deadstart monitor 
program's Exchange Package. Other packages provide for object programs 
and monitor tasks. Nonmonitor packages lie outside of the field lengths 
for the programs they represent as determined by the base and limit 
addresses for the programs. Only the monitor program has a field defined 
so that it can access all of memory, including Exchange Package areas. 

The defined field allows the monitor program to define or alter all 
Exchange Packages other than its own when it is the currently active 
Exchange Package. Since no interlock exists between an exchange sequence 
in a CPU and memory transfers in another CPU, modification of Exchange 
Packages which can be used by another CPU should be avoided, except under 
software controlled situations. 

Proper management of Exchange Packages dictates that a nonmonitor program 
always exchanges back to the monitor program that exchanged to it. The 
exchange ensures that the program information is always exchanged into 
its proper Exchange Package. 

For example, the monitor program (A) begins an execution interval 
following deadstart. No interrupts (except memory) can terminate its 
execution interval since it is in monitor mode. Program A voluntarily 
exits by issuing a normal exit instruction (004). Before doing so, 
however, program A sets the XA register contents to point to the user 
program (B) Exchange Package so that program B is the next program to 
execute. Program A sets the exchange address in program B's Exchange 
Package to point back to program A. 

The exchange sequence to program B causes the exchange address from 
program B's Exchange Package to be entered in the XA register. 
Concurrently, the exchange address in the XA register goes to program B's 
Exchange Package area with all other program parameters for program A. 
When the exchange is complete, program B begins its execution interval. 

To illustrate the exchange sequence, assume that while program B is 
executing, an Interrupt flag sets initiating an exchange sequence. Since 
program B cannot alter the XA register, the exit is back to program A. 
Program B's parameters exchange back into its Exchange Package area; 
program A's parameters held in program B's package area during the 
execution interval exchange back into the operating registers. 

Program A, upon resuming execution, determines an interrupt has caused 
the exchange and sets the XA register to call the proper interrupt 
processor into execution. To do this, program A sets XA to point to the 
Exchange Package for the interrupt processing program (C). Program A 
clears the interrupt and initiates execution of program C by executing a 
normal exit instruction (004). Depending on the operating task, program 
C can execute in monitor mode or in user mode. 
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MEMORY FIELD PROTECTION 

At execution time, each object program has a designated field of memory 
for instructions and data. The field limits are specified by the monitor 
program when the object program is loaded and initiated. The fields can 
begin at any word address that is a multiple of 64 (that is, IOO3) and 
can continue to another address that is one less than a multiple of 64. 
The fields can overlap. 

All memory addresses contained in the object program code are relative to 
one of the two base addresses specifying the beginning of the appropriate 
field. An object program cannot read or alter any memory location with 
an absolute address lower than that base address. Each object program 
reference to memory is checked against the limit and base addresses to 
determine if the address is within the bounds assigned. A memory read 
reference beyond the assigned field limits issues and completes, but a 
zero value is transferred from memory. A memory write reference beyond 
the assigned field limits is allowed to issue, but no write occurs. 

Field limits are contained in four registers: the Instruction Base 
Address (IBA) register, the Instruction Limit Address (ILA) register, the 
Data Base Address (DBA) register, and the Data Limit Address (DLA) 
register. The following paragraphs describe these four registers and 
flags associated with the field limits. 



INSTRUCTION BASE ADDRESS REGISTER 

The IBA register holds the base address of the user's instruction field. 
An instruction can only be executed by the CPU if the absolute address at 
which the instruction is located is greater than or equal to the contents 
of the current Exchange Package IBA register of the program executing. 
This determination is made at instruction buffer fetch time by the CPU. 

The IBA register contents are interpreted as the high-order 18 bits of a 
24-bit memory address. The low-order 6 bits of the address are assumed 
to be because of the number of banks (64 decimal). Absolute memory 
addresses for an instruction fetch are formed by adding the IBA register 
to the P register (high-order 22 bits) modulo two to the twenty-fourth 
power. 

A reference to an absolute address less than the address defined by IBA 
can only occur through a jump or branch instruction to an address beyond 
the memory capacity of the machine. 



HR-0097 3-17 



INSTRUCTION LIMIT ADDRESS REGISTER 

The ILA register holds the limit address of the user's field. An 
instruction can only be executed by the CPU if the absolute address where 
it is located is less than the contents of the current Exchange Package 
ILA register of the program executing. This determination is made at 
instruction buffer fetch time by the CPU. 

The ILA register contents are interpreted as the high-order 18 bits of a 
24-bit memory address. The low-order 6 bits of the address are assumed 
to be because of the number of banks (64 decimal). The largest 
absolute address that can be executed by a program is defined by 
[(ILA) x 2 6 ] - 1. 

If the final absolute address of the instruction buffer fetch as computed 
by the CPU does not fall between the range of addresses contained within 
the currently executing Exchange Package IBA and ILA registers, the CPU 
generates a program range error interrupt. 



DATA BASE ADDRESS REGISTER 

The DBA register holds the base address of the user's data field. An 
operand can only be fetched or stored by the CPU if the absolute address 
where the operand is located is greater than or equal to the contents of 
the current Exchange Package DBA register of the program executing. This 
determination is made each time an operand is fetched or stored by the 
CPU. 

The DBA register contents are interpreted as the high-order 18 bits of a 
24-bit memory address. The low-order 6 bits of the DBA register are 
assumed to be 0. Absolute memory addresses for operands are formed by 
adding the DBA register to the modified operand address modulo two to the 
twenty- fourth power. 



DATA LIMIT ADDRESS REGISTER 

The DLA register holds the (upper) limit address of the user's data 
field. An operand can only be fetched or stored by the CPU if the 
absolute address where the operand is located is less than the contents 
of the current Exchange Package DLA register of the program executing. 
This determination is made each time an operand is fetched or stored by 
the CPU. 

The DBA register contents are interpreted as the high-order 18 bits of a 
24-bit memory address. The low-order 6 bits of the DBA register are 
assumed to be 0. The largest absolute address that can be referenced for 
data by a program is defined by [(DLA) x 2^] - 1. 
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If the final absolute address of the operand as computed by the CPU does 
not fall between the range of addresses contained within the currently 
executing Exchange Package DBA and DLA registers, the CPU generates an 
operand (address) range error interrupt. 



PROGRAM RANGE ERROR 

The Program Range Error flag sets if a memory reference outside the 
boundaries of the IBA and ILA registers is for an instruction fetch. An 
out-of-range memory reference can occur in a nonmonitor mode program on a 
branch or jump instruction calling for a program address above or below 
the limits. The Program Range Error flag causes an error condition that 
terminates program execution. The monitor program checks the state of 
the Program Range Error flag and takes appropriate action, perhaps 
aborting the user program. 



OPERAND RANGE ERROR 

The Operand Range Error flag sets if the Operand Range Error Mode flag is 
set and a memory reference outside the boundaries of the DBA and DLA 
registers is called to read or write an operand for an A, B, S, T, or V 
register and the Operand Range Interrupt Error flag is set. The Operand 
Range Error flag causes an error condition that terminates the user 
program execution. The monitor program checks the state of the Operand 
Range Error flag and takes appropriate action, perhaps aborting the user 
program. 



PROGRAMMABLE CLOCK 

The programmable clock can be used to accurately measure the duration of 
intervals. Intervals selected under monitor program control/generate a 
periodic interrupt. Clock frequency/ intervals are as follows: 

CPU Speed Frequency Interval 

8.5-ns CP 117 Mhz 8.5-ns through 36.5 s 
9.5-ns CP 105 Mhz 9.5-ns through 40.8 s 

Intervals shorter than 100-ms are not practical due to the monitor 
overhead involved in processing the interrupt. Supporting the 
programmable clock are the Interrupt Interval (II) register, the 
Interrupt Countdown (ICD) counter, and four monitor mode instructions. 
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INSTRUCTIONS 

Four monitor mode instructions support the programmable clock: 

Octal Code CAL Syntax Description 

0014 j4 PCI Sj Enters Interrupt Interval (II) register 

with (Sj) 

001405 CCI Clears the programmable clock interrupt 

request 

001406 ECI Enables the programmable clock interrupt 

request 

001407 DCI Disables the programmable clock 

interrupt request 



INTERRUPT INTERVAL REGISTER 

The 32-bit II register can be loaded with a binary value equal to the 
number of CPs that are to elapse between programmable clock interrupt 
requests. The interrupt interval is transferred from the low-order 32 
bits of the Sj register into the II register and the ICD counter when 
instruction 0014J4 is executed. 

This value is held in the II register and is transferred to the ICD 
counter each time the counter reaches and generates an interrupt 
request. The II register contents is changed only by another instruction 
0014 j4. 



INTERRUPT COUNTDOWN COUNTER 

The 32-bit ICD counter is preset to the II register contents when 
instruction 0014J4 is executed. This counter runs continuously but 
counts down, decrementing by 1 each CP until the content of the counter 
is 0. The ICD sets the programmable clock interrupt request and samples 
the interval value held in the II register. The ICD repeats the 
countdown to zero cycle, setting the programmable clock interrupt request 
at regular intervals determined by the interval value. 

When the programmable clock interrupt request is set, it remains set 
until a clear programmable clock interrupt request is executed. A 
programmable clock interrupt request can be set only after the enable 
programmable clock interrupt request is executed. A programmable clock 
interrupt request causes an interrupt only when not in monitor mode. A 
request set in monitor mode is held until the system switches to user 
mode. 
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CLEAR PROGRAMMABLE CLOCK INTERRUPT REQUEST 

Following a program interrupt interval, an active programmable clock 
interrupt request can be cleared by executing instruction 001405. 

Following any deadstart, the monitor program should ensure the state of 
the programmable clock interrupt by issuing instructions 001405 and 
001407. 



PERFORMANCE MONITOR 

The system contains a set of eight performance counters to track certain 
hardware-related events that can be used to indicate relative 
performance. The events that can be tracked are the number of specific 
instructions issued, hold issue conditions, fetches, and references; they 
are selected through instruction 0015j'0. Refer to appendix C for 
complete information on performance monitoring. 



DEADSTART SEQUENCE 

The deadstart sequence of operations starts a program running in the 
mainframe after power has been turned off and then turned on again or 
whenever the operating system is to be reinitialized in the mainframe. 
All registers in the machine, all control latches, and all words in 
memory should be considered invalid after power has been turned on. The 
IOS initiates the following sequence of operations to begin the program: 

1. Turns on Master Clear signal 

2. Turns on I/O Clear signal 

3. Turns off I/O Clear signal 

4. Loads memory via IOS 

5. Turns off Master Clear signal 

The Master Clear signal halts all internal computation and forces 
critical control latches to predetermined states. The I/O Clear signal 
clears the input CA register of the MCU channel and activates the MCU 
input channel. All other input channels remain inactive. The IOS then 
loads an initial Exchange Package and monitor program. The Exchange 
Package must be located at address in memory. Turning off the Master 
Clear signal initiates the exchange sequence to read this package and to 
begin execution of the monitor program in CPU (PN=0). 
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The other CPUs remain in a master-cleared state until instruction 
0014 jl (IP) is issued in the CPU with PN=0. Then the CPU with PN=j 
exchanges to address in memory. 

Because the exchange of CPU overwrites the contents of the inactive 
Exchange Package at address 0, CPU must reinitialize the Exchange 
Package at address before allowing other CPUs to start. (Any CPU can 
be started first by using a switch on the control panel.) Subsequent 
actions are dictated by the design of the operating system. 
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CPU COMPUTATION SECTION 



Each CPU contains an identical, independent computation section. A 
computation section consists of operating registers and functional units 
associated with three types of processing: address, scalar, and vector. 
Address processing operates on internal control information, such as 
addresses and indexes, and has two levels of 24-bit registers and two 
integer arithmetic functional units. Scalar and vector processing are 
performed on data. 

A vector is an ordered set of elements. A vector instruction operates on 
a series of elements repeating the same function and producing a series 
of results. Scalar processing starts an instruction, handles one operand 
or operand pair, then produces a single result. 

The main advantage of vector over scalar processing is eliminating 
instruction start-up time for all but the first operand. Scalar 
processing has two levels of 64-bit scalar registers, four functional 
units dedicated solely to scalar processing, and three floating-point 
functional units shared with vector operations. Vector processing has a 
set of 64-element registers of 64 bits each, five functional units 
dedicated solely to vector applications, and three floating-point 
functional units supporting both scalar and vector operations. 

Address information flows from Central Memory or from control registers 
to address registers. Information in the address registers is 
distributed to various parts of the control network for use in 
controlling the scalar, vector, and I/O operations. The address 
registers can also supply operands to two integer functional units. The 
units generate address and index information and return the result to the 
address registers. Address information can also be transmitted to 
Central Memory from the address registers. 

Data flow in a computation section is from Central Memory to registers 
and from registers to functional units. Results flow from functional 
units to registers and from registers to Central Memory or back to 
functional units. Data flows along either the scalar or vector path, 
depending on the processing mode. An exception is that scalar registers 
can provide one required operand for vector operations performed in the 
vector functional units. 



HR-0097 4-1 



The computation section performs integer or floating-point arithmetic 
operations. Integer arithmetic is performed in twos complement mode. 
Floating-point quantities have signed magnitude representation. 

Floating-point instructions provide for addition, subtraction, 
multiplication, and reciprocal approximation. The reciprocal 
approximation instructions provide for a floating-point divide operation 
using a multiple instruction sequence. These instructions produce 64-bit 
results (1-bit sign, 15-bit exponent, and 48-bit normalized coefficient). 

Integer or fixed-point operations are integer addition, integer 
subtraction, and integer multiplication. Integer addition and 
subtraction operations produce either 24-bit or 64-bit results. An 
integer multiply operation produces a 24-bit result. A 64-bit integer 
multiply operation is done through a software algorithm using the 
floating-point multiply functional unit to generate multiple partial 
products. These partial products are then shifted and merged to form the 
full 64-bit product. No integer divide instruction is provided; the 
operation is accomplished through a software algorithm using 
floating-point hardware. 

The instruction set includes Boolean operations for OR, AND, equivalence, 
and exclusive OR and for a mask-controlled merge operation. Shift 
operations allow the manipulation of either 64-bit or 128-bit operands to 
produce 64-bit results. With the exception of 24-bit integer arithmetic, 
most operations are implemented in vector and scalar instructions. The 
integer product is a scalar instruction designed for index calculation. 
Full indexing capability allows the programmer to index throughout memory 
in either scalar or vector modes. The index can be positive or negative 
in either mode. Indexing allows matrix operations in vector mode to be 
performed on rows or the diagonal as well as conventional column-oriented 
operations. 

Population and parity counts are provided for both vector and scalar 
operations. An additional scalar operation is the leading zero count. 

Characteristics of a CPU computation section are summarized as follows. 

Integer and floating-point arithmetic 

Twos complement integer arithmetic 

Signed magnitude floating-point arithmetic 

Address, scalar, and vector processing modes 

Fourteen functional units 

Eight 24-bit address (A) registers 

Sixty-four 24-bit intermediate address (B) registers 

Eight 64-bit scalar (S) registers 

Sixty-four 64-bit intermediate scalar (T) registers 

Eight 64-element vector (V) registers, 64 bits per element 
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OPERATING REGISTERS 

Operating registers, a primary programmable resource of a CPU, enhance 
the speed of the system by satisfying heavy demands for data made by the 
functional units. A single functional unit can require one to three 
operands per clock period (CP) to perform the necessary functions and can 
deliver results at a rate of one per CP. Multiple functional units can 
be used concurrently. 

A CPU has three primary and two intermediate sets of registers. The 
primary sets of registers are address, scalar, and vector, designated as 
A, S, and V, respectively. These registers are considered primary 
because functional units can access them directly. 

For the A and S registers, an intermediate level of registers exists 
which is not accessible to the functional units but acts as a buffer for 
the primary registers. Block transfers are possible between these 
registers and Central Memory so that the number of memory reference 
instructions required for scalar and address operands is greatly 
reduced. The intermediate registers that support the A registers are 
referred to as B registers. The intermediate registers that support S 
registers are referred to as T registers. 



ADDRESS REGISTERS 

Figure 4-1 shows registers and functional units used for address 
processing. The two types of address registers are designated A 
registers and B registers and are described in the following paragraphs. 



A REGISTERS 

Eight 24-bit A registers serve a variety of applications but are 
primarily used as address registers for memory references and as index 
registers. They provide values for shift counts, loop control, and 
channel I/O operations and receive values of population count and leading 
zeros count. In address applications, A registers index the base address 
for scalar memory references and provide both a base address and an 
address increment for vector memory references. 

The address functional units support address and index generation by 
performing 24-bit integer arithmetic on operands obtained from A 
registers and by delivering the results to A registers. 
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Figure 4-1. Address Registers and Functional Units 



Data is moved directly between Central Memory and A registers or is 
placed in B registers. Placing data in B registers allows buffering of 
the data between A registers and Central Memory. Data can also be 
transferred between A and S registers and between A and Shared Address 
(SB) registers. 

The Vector Length (VL) register and Exchange Address (XA) register are 
set by transmitting a value to them from an A register. The VL register 
can also be transmitted to an A register. (The VL register is described 
under Vector Control Registers later in this section. ) 
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When an instruction delivering new data to an A register issues, a 
reservation is set for that register. The reservation prevents issue of 
instructions that use the register until the new data is delivered. 

In this manual/ the A registers are individually referred to by the 
letter A followed by a number ranging from through 7. Instructions 
reference A registers by specifying the register number as the h, i, 
j, or k designator as described in section 5. 

The only register implicitly referenced is the AO register as illustrated 
in the following instructions: 

Octal Code CAL Syntax Description 

OlOijkm JAZ exp Branch to ijkm if (A0)=0 

Ollijkm JAN exp Branch to ijkm if (A0)/0 

012ij"ton JAP exp Branch to ijkm if (AO) is positive, 

includes (A0)=0 

013ijfon JAM exp Branch to ijkm if (AO) is negative 

034ijfc Ejk,ki ,A0 Read (hi) words to B register jk 

from (AO) . 

035ijk ,A0 Bjk,ki Store (Ai) words at B register jk to 

(AO) 

0Z6ijk Tjfc,Ai ,A0 Read (Ai) words to T register jk from 

(AO) 

031ijk ,A0 1jk f ki Store (Ai) words at T register jk to (AO) 

176i0fc Vi ,A0,A/c Read (VL) words to Vi from (AO) 

incremented by (kk) 

176ilk Vi ,A0.V* Read (VL) words to Vi using (AO) + (V*) 

mojk f kQ,kk Vj Store (VL) words from Vj to (AO) 

incremented by (kk) 

mijk t kO,VkVj Store (VL) words from Vj using 

(AO) + (V/c) 

Section 5 contains additional information on the use of A registers by 
instructions. 
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B REGISTERS 

A computation section contains sixty-four 24-bit B registers used as 
intermediate storage for the A registers. Typically, B registers contain 
data to be referenced repeatedly over a sufficiently long span, making it 
unnecessary to retain the data in either A registers or in Central 
Memory. Examples of uses are loop counts, variable array base addresses, 
and dimensions. 

Transfer of a value between an A register and a B register requires only 
1 CP. A block of B registers can be transferred to or from Central 
Memory at the maximum rate of one 24-bit value per CP. A reservation is 
made on all B registers during block transfers to and from B registers. 



NOTE 

Other instructions can issue on the CRAY X-MP while a 
block of B registers is being transferred to or from 
Central Memory. 



B registers are individually referred to by the letter B followed by a 
2-digit octal number ranging from 00 through 77. Instructions reference 
B registers by specifying the B register number in the jk designator as 
described in section 5. 

The only B register implicitly referenced is the BOO register. On 
execution of the return jump instruction, 007 ijkm, register BOO is set 
to the next instruction parcel address (P) and a branch to an address 
specified by ijkm occurs. Upon receiving control, the called routine 
conventionally saves (BOO) so that the BOO register is available for the 
called routine to initiate return jumps of its own. When a called 
routine wishes to return to its caller, it restores the saved address and 
executes instruction 0050J&. Conventionally, this instruction, which 
is a branch to (Bjk) , causes the address saved in Bjk to be entered 
into the P register as the address of the next instruction parcel to be 
executed. 



SCALAR REGISTERS 

Figure 4-2 shows registers and functional units used for scalar 
processing. The two types of scalar registers are designated S registers 
and T registers and are described in the following paragraphs. 
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Figure 4-2. Scalar Registers and Functional Units 



S REGISTERS 

Eight 64-bit S registers are the principal scalar registers for a CPU 
serving as the source and destination for operands executing scalar 
arithmetic and logical instructions. Scalar functional units perform 
both integer and floating-point arithmetic operations. 

S registers can furnish one operand in vector instructions. Single-word 
transmissions of data between an S register and an element of a V 
register are also possible. 
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Data is moved directly between Central Memory and S registers or is 
placed in T registers. This intermediate step allows buffering of scalar 
operands between S registers and Central Memory. Data is also 
transferred between A and S registers, between S and Shared Scalar (ST) 
registers, and between S and Semaphore (SM) registers. 

Other uses of the S registers are the setting or reading of the Vector 
Mask (VM) register or the Real-time Clock (RTC) register or setting the 
Interrupt Interval (II) register. 

When an instruction delivering new data to an S register issues, a 
reservation is set for that register preventing issue of instructions 
that read the register until the new data is delivered. 

The S registers are individually referred to by the letter S followed by 
a number ranging from through 7. Instructions reference S registers by 
specifying the register number as the i, J, or k designator as 
described in section 5. 

The only register implicitly referenced is the SO register, as 
illustrated in the following instructions. 

Octal Code CAL Syntax Description 

14 ijkm JSZ exp Branch to ijkm if (S0)=0 

015ijkm JSN exp Branch to ijkm if (S0)/0 

016ijkm JSP exp Branch to ijkm if (SO) is positive, 

includes (S0)=0 

017 ijkm JSM exp Branch to ijkm if (SO) is negative 

052ijk SO Si<exp Shift (Si) left jk places to SO 

0S3ijk SO Si>exp Shift (Si) right jk places to SO 

The Status register provides the status of the following flags: 

• Processor Number (PN) 

• Program State (PS) 

• Clustered, CLN £ (CL) 

• Floating-point Interrupts Enabled (IFP) 

• Floating-point Error (FPE) 

• Bidirectional Memory Enabled (BDM) 



Operand Range Interrupts Enabled (IOR) 
Cluster number bits 2° through 2 3 (CLN) 



Instruction 073 sends the Status register contents to an S register. 
Section 5 has additional information on the use of S registers by 
instructions . 
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T REGISTERS 

The computation section has sixty-four 64-bit T registers used as 
intermediate storage for the S registers. Data is transferred between T 
and S registers and between T registers and Central Memory. Transfer of 
a value between a T register and an S register requires only 1 CP. 
T registers reference Central Memory through block read and block write 
instructions. Block transfers occur at a maximum rate of 1 word per CP. 
A reservation is made on all T registers during block transfers to and 
from T registers. 



NOTE 

Other instructions can issue on the CRAY X-MP while a 
block of T registers is being transferred to or from 
Central Memory. 



T registers are referred to by the letter T and a 2-digit octal number 
ranging from 00 through 77. Instructions reference T registers by 
specifying the octal number as the jk designator as described in 
section 5. 



VECTOR REGISTERS 

Figure 4-3 illustrates the registers and functional units used for vector 
operations. The following paragraphs describe Vector registers and 
Vector Control registers. 



V REGISTERS 

The major computational registers of a CPU are eight V registers, each 
with 64 elements. Each V register element has 64 bits. When associated 
data is grouped into successive elements of a V register, the register 
quantity can be treated as a vector. Examples of vector quantities are 
rows or columns of a matrix or elements of a table. Computational 
efficiency is achieved by identically processing each element of a 
vector. Vector instructions provide for the iterative processing of 
successive V register elements. A vector operation always begins when 
operands are obtained from the first element of the operand V registers 
and the result is delivered to the first element of a V register. 
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Figure 4-3. Vector Registers and Functional Units 



Successive elements are provided each CP and as each operation is 
performed, the result is delivered to successive elements of the result V 
register. The vector operation continues until the number of operations 
performed by the instruction equals a count specified by the VL register 
contents. 

V register contents are transferred to or from Central Memory in a block 
mode by specifying a first word address in Central Memory, an increment 
or decrement for the Central Memory address or a set of indexes contained 
in a separate vector register, and a vector length. The transfer then 
proceeds beginning with the first element of the V register at a maximum 
rate of 1 word per CP, depending upon bank conflicts. 

Discontinuities in the vector data stream can occur as a result of memory 
conflicts. These discontinuities, although not inhibiting chained 
operations, can appear in the chained operation data stream. Any 
discontinuity in the data stream adds proportionally to the total 
execution time of the vector operation. 

Single-word data transfers are possible between an S register and an 
element of a V register. 
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Since many vectors exceed 64 elements, a long vector is processed as one 
or more 64-element segments and a possible remainder of less than 64 
elements. Generally, it is convenient to compute the remainder and 
process this short segment before processing the remaining number of 
64-element segments. A programmer can choose, however, to construct the 
vector loop code in a number of ways. The processing of long vectors in 
FORTRAN is handled by the compiler and is transparent to the programmer. 

A V register receiving results can also supply operands to a subsequent 
operation. Using a register as both a result and an operand register in 
two different operations allows for the chaining together of two or more 
vector operations and two or more results can be produced per CP. The 
CPU automatically detects chained operations; they are not explicitly 
specified by the programmer. A programmer can reorder certain code 
segments to gain as much concurrency as possible in chained operations. 

A conflict can occur between vector and scalar operations involving 
either floating-point operations or memory access. With the exception of 
these operations, the functional units are always available for scalar 
operations. A vector operation occupies the selected functional unit 
until the vector is processed. 

Parallel vector operations can be processed in two ways: 

• Using different functional units and all different V registers 

• Using the result stream from one V register simultaneously as the 
operand to another operation using a different functional unit 
(chain mode) 

Parallel operations on vectors allow the generation of two or more 
results per CP. Most vector operations use two V registers as operands 
or one S and one V register as operands. Exceptions are vector shifts, 
vector logicals, vector reciprocals, and the load or store instructions. 

The V registers are individually referred to by the letter V followed by 
a number ranging from through 7. Vector instructions reference V 
registers by specifying the register number as the i, J, or k 
designator as described in section 5. 

Individual elements of a V register are designated by decimal numbers 
ranging from 00 through 63. These appear as subscripts to vector 
register references. For example, V62g refers to element 29 of V 
register 6. 
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NOTE 

Parallel loading and storing of V registers is 
possible; two load operations and one store operation 
can occur simultaneously. 



V register reservations and chaining 

Reservation describes the condition of a register in use; that is, the 
register is not available for another operation as a result or as an 
operand register. Each register has two reservation conditions; one 
reserving it as a operand register and one reserving it as a result 
register. During execution of a vector instruction, reservations are 
placed on the operand V registers and on the result V register. These 
reservations are placed on the registers themselves, not on individual 
elements of the V register. 

If a V register is reserved as a result and not as an operand, it can be 
used at any time as an operand and chaining occurs. This flexible 
chaining mechanism allows chaining to begin at any point in the result 
vector data stream. Full chaining occurs if the instruction causing 
chaining is issued before or at the time element of the result arrives 
at the V register. Partial chaining occurs if the instruction issues 
after the arrival of element 0. Thus, the amount of concurrency in a 
chained operation depends upon the relationship between the issue time of 
the chaining instruction and the result vector data stream. 

If a V register is reserved as an operand, it cannot be used as a result 
or operand register until the operand reservation clears. A V register 
can be used, however, as both an operand and result in the same vector 
operation. A V register can serve only one vector operation as the 
source of one or both operands. A V register can serve only one vector 
operation as a result. 

No reservation is placed on the VL register during vector processing. If 
a vector instruction employs an S register, no reservation is placed on 
the S register. The S register can be modified in the next instruction 
after vector issue without affecting the vector operation. The length 
and scalar operand (if appropriate) of each vector operation is 
maintained apart from the VL register and S register. Vector operations 
employing different lengths can proceed concurrently. 

Even when a vector load operation pauses, allowing instructions to get 
synchronized, a few cycles later chained operations may proceed as soon 
as data becomes available. (Thus, if a late chain slot is made, the loop 
might run at full speed.) 
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The AO and kk registers in a vector memory reference are treated 
similarly and are available for modification immediately after use, 

CAUTION 

CRI cautions against using a vector register as both a 
result and an operand if compatibility between a 
CRAY X-MP and CRAY-1 computer system is necessary 
because vector recursion is not available on all Cray 
computer systems. 



VECTOR CONTROL REGISTERS 

The Vector Length (VL) register and Vector Mask (VM) register provide 
control information needed in the performance of vector operations and 
are described below. 



Vector Length register 

The 7-bit VL register is set to 1 through 100 8 (VL = gives VL = 100 8 ), 
specifying the length of all vector operations performed by vector 
instructions and the length of the vectors held by the V registers. The 
VL register controls the number of operations performed for instructions 
140 through 177 and is set to an A register value using instruction 0020 
or read using instruction 023i01. 



Vector Mask register 

The VM register has 64 bits, each corresponding to a word element in a V 
register. Bit 2^3 corresponds to element 0, bit 2° to element 63. 
The mask is used with vector merge and test instructions to allow 
operations to be performed on individual vector elements. 

The VM register can be set from an S register through instruction 003 or 
can be created by testing a V register for a condition using instruction 
175. The mask controls element selection in the vector merge 
instructions (146 and 147). Instruction 073 sends the VM register 
contents to an S register. 
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FUNCTIONAL UNITS 

Instructions other than simple transmits or control operations are 
performed by specialized hardware known as functional units. Each unit 
implements an algorithm or a portion of the instruction set. Functional 
units have independent logic except for the Reciprocal Approximation, 
Vector Population Count, Floating-point Multiply, and Second Vector 
Logical units (described later in this section), which share some logic. 
All functional units can be in operation simultaneously. 

A functional unit receives operands from registers and delivers the 
result to a register when the function has been performed. Functional 
units operate essentially in three-address mode with source and 
destination addressing limited to register designators. 

All functional units perform algorithms in a fixed amount of time; delays 
are impossible once the operands have been delivered to the unit. Time 
required from delivery of the operands to the functional unit until 
completion of the calculation is called the functional unit time and is 
measured in CPs. 

Functional units are fully segmented. This means a new set of operands 
for unrelated computation can enter a functional unit each CP even though 
the functional unit time can be more than 1 CP. This segmentation is 
possible when information arrives at the functional unit and is held in 
the functional unit or moves within the functional unit at the end of 
every CP. 

Fourteen functional units are identified and are arbitrarily described in 
four groups: address, scalar, vector, and floating-point. Each of the 
first three groups functions with one of the primary register types (A, 
S, and V) to support the address, scalar, and vector modes of processing 
available in the system. The fourth group, floating-point, supports 
either scalar or vector operations and accepts operands from or delivers 
results to S or V registers. In addition, Central Memory acts like a 
fifteenth functional unit for vector operations. 



ADDRESS FUNCTIONAL UNITS 

Address functional units perform 24-bit integer arithmetic on operands 
obtained from A registers and deliver the results to an A register. The 
arithmetic is twos complement. 



Address Add functional unit 

The Address Add functional unit performs 24-bit integer addition and 
subtraction. The unit executes instructions 030 and 031. Addition and 
subtraction are performed in a similar manner. The twos complement 
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subtraction for instruction 031 occurs when the ones complement of the 
A* operand is added to the Aj operand. Then a 1 is added in the 
low-order bit position of the result. The Address Add functional unit 
detects no overflow. 

The Address Add functional unit time is 2 CPs. 



Address Multiply functional unit 

The Address Multiply functional unit executes instruction 032 forming a 
24-bit integer product from two 24-bit operands. No rounding is 
performed. The result consists of the least significant 24 bits of the 
product. 

This functional unit is designed to handle address manipulations not 
exceeding its data capabilities. The programmer must be careful when 
multiplying integers in the functional unit because the unit does not 
detect overflow of the product and significant portions of the product 
could be lost. 

The Address Multiply functional unit time is 4 CPs. 



SCALAR FUNCTIONAL UNITS 

Scalar functional units perform operations on 64-bit operands obtained 
from S registers and usually deliver the 64-bit results to an S 
register. The exception is the Population/Leading Zero Count functional 
unit which delivers its 7-bit result to an A register. 

Four functional units are exclusively associated with scalar operations 

and are described below. Three functional units are used for both scalar 

and vector operations and are described in the subsection on 

Floating-point Functional Units. 



Scalar Add functional unit 

The Scalar Add functional unit performs 64-bit integer addition and 
subtraction and executes instructions 060 and 061. Addition and 
subtraction are performed in a similar manner. The twos complement 
subtraction for instruction 061 occurs when the ones complement of the 
Sk operand is added to the Sj operand. Then a 1 is added in the 
low-order bit position of the result. The Scalar Add functional unit 
detects no overflow. 

The Scalar Add functional unit time is 3 CPs. 
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Scalar Shift functional unit 

The Scalar Shift functional unit shifts the entire 64-bit contents of an 
S register or shifts the double 128-bit contents of two concatenated S 
registers. Shift counts are obtained from an A register or from the jk 
portion of the instruction. Shifts are end off with zero fill. For a 
double shift, a circular shift is effected if the shift count does not 
exceed 64 and the i and j designators are equal and nonzero. 

The Scalar Shift functional unit executes instructions 052 through 057. 
Single-shift instructions (052 through 055) have a functional unit time 
of 2 CPs. Double-shift instructions (056 and 057) have a functional unit 
time of 3 CPs. 



Scalar Logical functional unit 

The Scalar Logical functional unit performs bit-by-bit manipulation of 
64-bit quantities obtained from S registers. It executes instructions 
042 through 051, the mask, and Boolean instructions. Instructions 042 
through 051 have a functional unit time of 1 CP. 



Scalar Population/Parity/Leading Zero functional unit 

This functional unit executes instructions 026 and 027. Instruction 
026ij'0 counts the number of bits in an S register having a value of 1 
in the operand and has a functional unit time of 4 CPs. Instruction 
026ijl returns a 1-bit population parity count (even parity) of the 
Sj register's contents. Instruction 027 counts the number of bits of 
preceding a 1 bit in the operand and has a functional unit time of 3 
CPs. For these instructions, the 64-bit operand is obtained from an S 
register and the 7-bit result is delivered to an A register. 



VECTOR FUNCTIONAL UNITS 

Most vector functional units perform operations on operands obtained from 
one or two V registers or from a V register and an S register. The 
Reciprocal, Shift, and Population/Parity functional units, which require 
only one operand, are exceptions. Results from a vector functional unit 
are delivered to a V register. 

Successive operand pairs are transmitted each CP to a functional unit. 
The corresponding result emerges from the functional unit n CPs later, 
where n is the functional unit time and is constant for a given 
functional unit. The VL register determines the number of operand pairs 
to be processed by a functional unit. 
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The functional units described in this section are exclusively associated 
with vector operations. Three functional units are associated with both 
vector operations and scalar operations and are described in the 
subsection on floating-point functional units. When a floating-point 
functional unit is used for a vector operation, the general description 
of vector functional units given in the subsection applies. 



Vector functional unit reservation 

A functional unit engaged in a vector operation remains busy during each 
CP and cannot participate in other operations. In this state, the 
functional unit is reserved. Other instructions requiring the same 
functional unit do not issue until the previous operation is completed 
(with the exception of instructions 140 to 145, which may use either of 
the vector logical units). When the vector operation completes, the 
reservation is dropped and the functional unit is then available for 
another operation. A vector functional unit is reserved for (VL) + 4 CPs 



Vector Add functional unit 

The Vector Add functional unit performs 64-bit integer addition and 
subtraction for a vector operation and delivers the results to elements 
of a V register. The unit executes instructions 154 through 157. 
Addition and subtraction are performed in a similar manner. For 
subtraction operations (156 and 157), the Vk operand is complemented 
before addition and a 1 is added into the low-order bit position of the 
result. The unit detects no overflow. 

The Vector Add functional unit time is 3 CPs. 



Vector Shift functional unit 

The Vector Shift functional unit shifts the entire 64-bit contents of a V 
register element or the 128-bit value formed from two consecutive 
elements of a V register. Shift counts are obtained from an A register 
and are end off with zero fill. 

All shift counts are considered positive unsigned integers. If any bit 
higher than 2^ is set, the shifted result is all zeros. 

The Vector Shift functional unit executes instructions 150 through 153. 
The functional unit time is 4 CPs for instruction 152, and the functional 
unit time is 3 CPs for instructions 150, 151, and 153. 
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Vector Logical functional units 

The CRAY X-MP four-processor series has two vector logical functional 
units: a Full Vector Logical unit and a Second Vector Logical unit. 

The Full Vector Logical unit performs bit-by-bit manipulations of the 
64-bit quantities for instructions 140 through 147, logical operations 
associated with the vector mask instruction 175/ and index generation. 
The Second Vector Logical unit performs bit-by-bit manipulations of 
64-bit quantities for instructions 140 through 145 only. 

Since both vector logical units can be used for instructions 140 through 
145, when these instructions issue to the CIP register, a selection is 
made to determine which vector functional unit is used. Once a selection 
has been made, the instruction is committed to using that functional unit. 
Normally, the instructions attempt to issue first to the Second Vector 
Logical unit and then, if the unit is busy, attempt to issue to the Full 
Vector Logical unit. If both units are busy, the first unit to clear is 
selected. The Second Vector Logical unit may be busy because of another 
instruction or because the unit is disabled. If there are other 
conflicts (register reservations) for the Second Vector Logical unit at 
the time the selection is made, the instructions will issue to the Full 
Vector Logical unit even though the Second Vector Logical unit clears 
before the instruction issues. When the Second Vector Logical unit is 
disabled, the functional unit busy always appears set and causes all 140 
through 145 instructions to issue to the Full Vector Logical unit. 

When the Second Vector Logical unit is enabled, it shares input and 
output datapaths and the same functional unit busy with the 
Floating-point Multiply unit, so they cannot be used simultaneously. 
Also, since the Second Vector Logical unit ties up the Floating-point 
Multiply unit, some codes that rely on floating-point products may run 
slower if the Second Vector Logical unit is enabled. 

The Second Vector Logical unit can be enabled and disabled through 
software by clearing bit of word 3 in the Exchange Package of a user 
program. If the bit is cleared, the unit is disabled and only the Full 
Vector Logical unit is available to instructions 140 through 145. 

Because instruction 17 5 uses the Full Vector Logical unit, it cannot be 
chained with instructions 146 and 147, nor may it be chained with 
instructions 140 through 145 unless the Second Vector Logical unit is 
enabled and the instructions issue through that unit. 

The Full Vector Logical functional unit time is 2 CPs; the Second Vector 
Logical functional unit time is 4 CPs. 
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Vector Population/Parity functional unit 

The Vector Population/Parity functional unit counts the 1 bits in each 
element of the source V register. The total number of 1 bits is the 
population count. This population count can be an odd or an even number, 
as shown by its low-order bit. 

Instructions 174ijl (vector population count) and 174ij2 (vector 
population count parity) use the same operation code as the vector 
reciprocal approximation instruction. Some restrictions for the 
Reciprocal Approximation functional unit also apply for vector population 
instructions (refer to subsection on Reciprocal Approximation). The 
vector population count instruction delivers the total population count 
to elements of the destination V register. 

The vector population count parity instruction delivers the low-order bit 
of the count to the destination V register. The Vector Population/Parity 
functional unit time is 5 CPs. 



FLOATING-POINT FUNCTIONAL UNITS 

Three floating-point functional units perform floating-point arithmetic 
for scalar and vector operations. When executing a scalar instruction, 
operands are obtained from S registers and results are delivered to an S 
register. When executing most vector instructions, operands are obtained 
from pairs of V registers, or from an S register and a V register. 
Results are delivered to a V register. An exception is the Reciprocal 
Approximation unit requiring only one input operand. 

The subsection on Floating-point Arithmetic contains information on 
floating-point out-of-range conditions. 



Floating-point Add functional unit 

The Floating-point Add functional unit performs addition or subtraction 
of 64-bit operands in floating-point format and executes instructions 
062, 063, and 170 through 173. A result is normalized even when operands 
are unnormalized. (The subsection on Floating-point Arithmetic describes 
normalized floating-point numbers.) Out-of-range exponents are detected 
as described in the subsection on Floating-point Arithmetic. 

Floating-point Add functional unit time is 6 CPs. 



Floating-point Multiply functional unit 

The Floating-point Multiply functional unit executes instructions 064 

through 067 and 160 through 167. These instructions provide for full- 

and half-precision multiplication of 64-bit operands in floating-point 
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format and for computing two minus a floating-point product for 
reciprocal iterations. 

The half -precision product is rounded; the full-precision product can be 
rounded or not rounded. 

Input operands are assumed to be normalized. The Floating-point Multiply 
functional unit delivers a normalized result only if both input operands 
are normalized. 

Out-of-range exponents are detected as described in the subsection on 
Floating-point Arithmetic. If both operands have zero exponents, 
however, the result is considered as an integer product, is not 
normalized, and is not considered out-of-range. This case provides a 
fast method of computing a 48-bit integer product, although the operands 
in this case must be shifted before the multiply operation. 

Because the Second Vector Logical functional unit and the Floating-point 
Multiply functional units share input and output datapaths, they cannot 
be used simultaneously. A reservation on one is a reservation on the 
other. 

The Floating-point Multiply functional unit time is 7 CPs. 



Reciprocal Approximation functional unit 

The Reciprocal Approximation functional unit finds the approximate 
reciprocal of a 64-bit operand in floating-point format. The unit 
executes instructions 070 and 174ij0. Since the Vector Population/Parity 
functional unit shares some logic with this unit, the k designator must 
be for the reciprocal approximation instruction to be recognized. 

The input operand is assumed to be normalized and if so, the result is 
correct. The high-order bit of the coefficient is not tested but is 
assumed to be a 1. Out-of-range exponents are detected as described 
under Floating-point Arithmetic. 

The Reciprocal Approximation functional unit time is 14 CPs. 



ARITHMETIC OPERATIONS 

Functional units in a CPU perform either twos complement integer 
arithmetic or floating-point arithmetic. 
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INTEGER ARITHMETIC 

All integer arithmetic, whether 24 bits or 64 bits, is twos complement 
and is represented in the registers as illustrated in figure 4-4. The 
Address Add and Address Multiply functional units perform 24-bit 
arithmetic. The Scalar Add and the Vector Add functional units perform 
64-bit arithmetic. 

Multiplication of two scalar (64-bit) integer operands is accomplished by 
using the floating-point multiply instruction and one of the two methods 
that follows. The method used depends on the magnitude of the operands 
and the number of bits to contain the product. 

Twos Complement Integer (24 bits) 
223 2 



Sign 

Twos Complement Integer (64 bits) 
2 63 



Sign 

Figure 4-4. Integer Data Formats 



If the operands are nonzero only in the 24 least significant bits, the 
two integer operands can be multiplied by shifting them each left 24 bits 
before the multiply operation. (The Floating-point Multiply functional 
unit recognizes the conditions where both operands have zero exponents as 
a special case.) The Floating-point Multiply functional unit returns the 
high-order 48 bits of the product of the coefficients as the coefficient 
of the result and leaves the exponent field 0. Refer to figure 4-8. If 
the operand coefficients are generated by other than shifting so the 
low-order 24 bits would be nonzero, the low-order 48 bits of the product 
could have been nonzero, and the high-order 48 bits (the return part) 
could be one larger than expected as a truncation compensation constant 
is always added during a multiply. 

If the operands are greater than 24 bits, multiplication is done by 
forming multiple partial products and then shifting and adding the 
partial products. 

Division is done by an algorithm; the particular algorithm used depends 
on the number of bits in the quotient. The quickest and most frequently 
used method is to convert the numbers to floating-point format and then 
use the floating-point functional units. 
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FLOATING-POINT ARITHMETIC 

Floating-point numbers are represented in a standard format throughout 
the CPU. This format is a packed representation of a binary coefficient 
and an exponent (power of two). The coefficient is a 48-bit signed 
fraction. The sign of the coefficient is separated from the rest of the 
coefficient as shown in figure 4-5. Since the coefficient is signed 
magnitude, it is not complemented for negative values. 



Binary Point 
2 63 2 62 2 48 T2 47 



Coeff. Exponent Coefficient 

Sign 

Figure 4-5. Floating-point Data Format 



The exponent portion of the floating-point format is represented as a 
biased integer in bits 2^ 2 through 2 4 ®. The bias that is added to 
the exponents is 400008* Ttie positive range of exponents is 4OOOO3 
through 577773. The negative range of exponents is 37777g through 
20OOO3. Thus, the unbiased range of exponents is the following (the 
negative range is one larger): 

2 -20000 8 through 2+ 17777 8 

In terms of decimal values, the floating-point format of the CRAY X-MP 
four-processor allows the accurate expression of numbers to about 15 
decimal digits in the approximate decimal range of io -24 66 through 
10 +2465 # 

Figure 4-6 and the following steps show the relationship between the 
bias, exponent, and coefficient. To convert the number to its decimal 
equivalent: 

1. Subtract the bias from the exponent to get the integer value of 
the exponent: 

-40000 



2. Multiply 2 raised to the integer value of the exponent by the 
normalized coefficient, expressed as a fraction, to get the 
result: 

2l 
x 0.4 8 

1T0 
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Binary Point 
2 63 2 62 2 48t247 




4000000000000000 



Coeff. Exponent Normalized Coefficient 

Sign 



Figure 4-6. Internal Representation of Floating-point Number (Octal) 



A zero value or an underflow result is not biased and is represented as a 
word of all zeros. 

A negative is not generated by any floating-point functional unit, 
except in the case where a negative is one operand going into the 
Floating-point Multiply functional unit. 

The remainder of this subsection describes normalized floating-point 
numbers, floating-point range errors, double-precision numbers, and the 
addition, multiplication, and division algorithms. 



Normalized floating-point numbers 

A nonzero floating-point number is normalized if the most significant bit 
of the coefficient is nonzero. This condition implies the coefficient 
has been shifted as far left as possible and the exponent has been 
adjusted accordingly. Therefore, the floating-point number has no 
leading zeros in the coefficient. The exception is that a normalized 
floating-point zero is all zeros. 

When a floating-point number is created by inserting an exponent of 
400608 i nto a 48-bit integer word, the result should be normalized 
before being used in a floating-point operation. Normalization is 
accomplished by adding the unnormalized floating-point operand to 0. 
Since SO provides a 64-bit when used in the Sj field of an 
instruction, an operand in Sk is normalized using the 062i0k 
instruction. Si, which can be Sk, contains the normalized result. 

The 170i0k instruction normalizes V& into Vi. 



Floating-point range errors 

Overflow of the floating-point range is indicated by an exponent value of 
6OOOO3 or greater in packed format. Detection of the overflow 
condition initiates an interrupt if the Floating-point Mode flag is set 
in the Mode register and monitor mode is not in effect. The 
Floating-point Mode flag can be set or cleared by a user mode program. 
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The Cray operating system COS keeps a bit in a table to indicate the 
condition of the mode bit. System software manipulates the mode bit and 
uses the table bit to indicate how the mode should be left for the user. 
Therefore, the user usually needs to put the appropriate bit in the table 
if the user changes the mode. 

Floating-point range error conditions are detected by the floating-point 
functional units as described in the following paragraphs. 

Floating-point Add functional unit - A floating-point add range error 
condition is generated for scalar operands when the larger incoming 
exponent is greater than or equal to 60000g. This condition sets the 
Floating-point Error flag with an exponent of 6OOOO3 being sent to the 
result register along with the computed coefficient, as in the following 
example: 

60000. 4xxxxxxxxxxxxxxx Range Error 
+57777 . 4 xxxxxxxxxxxxxxx 

60000. 6xxxxxxxxxxxxxxx Result Register 



NOTE 

If a floating-point add or subtract generates an 
exponent less than 20OOO3 or a coefficient of 0, the 
condition is considered an underflow and no fault is 
generated and the word returned from the functional 
unit is all bits. If either operand is out-of-bounds 
(exponent of 6OOOO3 or greater) or if the final sum 
or difference is out-of-bounds (exponent of 6OOOO3 or 
greater), the exponent is set to 6OOOO3, and a 
floating-point error is flagged. If floating-point 
faults are enabled, an interrupt occurs. Refer to the 
floating-point range errors subsection for more 
information. 



Floating-point Multiply functional unit - Whether or not out-of-range 
conditions occur, and how they are handled, can be determined using the 
exponent matrix shown in figure 4-7. The exponent of the result, for any 
set of exponents, falls into one of seven unique zones. A description of 
each zone follows. 
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Exponent of Operand 1 



& 




77777 



ie>H3 



Zone 

1 
2 



Figure 4-7. Exponent Matrix for Floating-point Multiply Unit 

Description 

This indicates a simple integer multiply; no fault is possible 

These exponents would result in an underflow condition. It is 
flagged as such, and the result is set to +0. (Multiply by 
is in this group.) 

Underflow may occur on this boundary. The final exponent can 
be 17777g or 20000g depending on whether a normalized 
shift is required. If a normalize shift is required, an 
exponent of 17777g is used, the underflow is not detected, 
and the coefficient and exponent are not zeroed out. 
Underflow detection is done on the exponent used for an 
unshifted product coefficient. 
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Zone Description 



The use of an operand with an underflow exponent is allowed if 
the final result is within the range 2OOOO3 to 577773. 

This is the normal operand range and normal results are 
produced. 

Overflow is flagged on this boundary. If a normalized shift 
is required, the value should be within bounds with a 57777g 
exponent. Since overflow is detected, however, using the 
exponent for the unnormalized shift condition (which is 
6OOOO3), a 60000g is inserted in the product as the final 
exponent. 

Within this zone, an overflow fault is flagged and the product 
exponent is set to 6OOOO3. 



NOTE 

If either operand is less than the machine minimum, the 
error is suppressed (even though the other operand can 
be out of range) because the operand that is less than 
the machine minimum takes precedence in error detection. 



Out-of-range conditions are tested before normalizing in the 
Floating-point Multiply functional unit. As shown, if both incoming 
exponents are equal to 0, the operation is treated as an integer 
multiply. The result is treated normally with no normalization shift of 
the result allowed. The result is a 48-bit quantity starting with 
bit 2^7 . When using this feature, the operands should be considered as 
24-bit integers in bits 2 4 ^ through 2^4. i n figure 4-8, if operand 1 
is 4 and operand 2 is 6, a 48-bit result of 3O3 is produced. Bit 2^ 
obeys the usual rules for multiplying signs and the result is a sign and 
magnitude integer. The form of integers (refer to figure 4-4) accepted 
by the integer add and subtract and expected by the software is twos 
complement, not sign and magnitude. Therefore, negative products must be 
converted. 
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If bits 2^ through 2^3 in operands 1 and 2 of figure 4-8 have any 
1 bits, the product might be one (2^) too large because a truncation 
compensation constant is added during the multiply process. (The 
following paragraphs discuss the truncation constant and its use.) The 
size of the shaded area in operands 1 and 2 (figure 4-8) does not need to 
be the same for both operands. To get a correct product, the only 
requirement is that the sum of the number of bits in the shaded area is 
48 bits or more. If the sum is more than 48 bits, the binary point in 
the product is the number of places to the left that the sum is in excess 
of 48 (that is, assuming the operand binary points are at the left 
boundary of the shaded areas). 

Floating-point Reciprocal Approximation functional unit - For the 
Floating-point Reciprocal Approximation functional unit, an incoming 
operand with an exponent less than or equal to 2OOOI3 or greater than 
or equal to 6OOOO3 causes a floating-point range error. The error flag 
is set and an exponent of 6OOOO3 and the computed coefficient are sent 
to the result register. 



Double-precision numbers 

The CPU does not provide special hardware for performing double- or 
multiple-precision operations. Double-precision computations with 95-bit 
accuracy are available through software routines provided by CRI . 



Operand 1 



2 63 




2 47 


223 2 




0- 


— 





04 


Must be to ensure 
product is correct 



Sign 



Operand 2 



0—0 



06 



ipi! 



be to ensure 



product is correct 



Sign 



Result 



0—0 



-030 



Sign 



Figure 4-8. Integer Multiply in Floating-point Multiply 
Functional Unit 
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Addition algorithm 

Floating-point addition or subtraction is performed in a 49-bit register 
(figure 4-9). Trial subtraction of the exponents selects the operand to 
be shifted down for aligning the operands. The larger exponent operand 
carries the sign. The coefficient of the number with the smaller 
exponent is shifted right to align with the coefficient of the number 
with the larger exponent. Bits shifted out of the register are lost; no 
roundup occurs. If the sum carries into the high-order bit/ the 
low-order bit is discarded and an appropriate exponent adjustment is 
made. All results are normalized and if the result is less than the 
machine minimum, the error is suppressed. 



48 



Discarded 



Figure 4-9. 49-bit Floating-point Addition 



The Floating-point Add functional unit normalizes any floating-point 
number within the format of the Cray floating-point number system. The 
functional unit right shifts 1 or left shifts up to 48 per result to 
normalize the result. 

One zero operand and one valid operand can be sent to the Floating-point 
Add functional unit, and the valid operand is sent through the unit 
normalized. Concurrently, the functional unit checks for overflow and/or 
underflow; underflow results are not flagged as errors. 



Multiplication algorithm 

The Floating-point Multiply functional unit has the two 48-bit 
coefficients as input into a multiply pyramid (refer to figure 4-10). If 
the coefficients are both normalized, a full product is either 95 bits or 
96 bits, depending on the value of the coefficients. A 96-bit product is 
normalized as generated. A 95-bit product requires a left shift of one 
to generate the final coefficient. If the shift is done, the final 
exponent is reduced by one to reflect the shift. 
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Multiplicand 



Product Bit Designation: r 

If Shift is Needed 
to Normalize Coefficient 

If Shift is not Needed 
to Normalize Coefficient* 2 - ' 




'33Y 



(1) hh = 11 2 for half-precision round, OO2 for 

full-precision rounded or full-precision unrounded 
multiply 

(2) ff - II2 for full-precision round, OO2 for 

half-precision rounded or full-precision unrounded 
multiply 

(Y) Truncation compensation constant, IOOI2 used for all 
multiplies 



Figure 4-10. Floating-point Multiply Partial-product Sums Pyramid 



f Bit designations are used in the explanation of the Floating-point 
Multiply functional unit operation. 
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The following discussion and the power of two designators used assumes 
that the product generated is in its final form; that is, no shift was 
required. 

On the system, the pyramid truncates part of the low-order bits of the 
96-bit product. To adjust for this truncation, a constant is 
unconditionally added above the truncation. The average value of this 
truncation is 9.2 5 x 2~56 , which was determined by adding all carries 
produced by all possible combinations that could be truncated and 
dividing the sum by the number of possible combinations. Nine carries 
are injected at the 2~56 position to compensate for the truncated 
bits. The effect of the truncation without compensation is at most a 
result coefficient one smaller than expected. With compensation, the 
results range from one too large to one too small in the 2~ 48 bit 
position with approximately 99 percent of the values having zero 
deviation from what would have been generated had a full 96-bit pyramid 
been present. The multiplication is commutative; that is, A times B 
equals B times A. 

Rounding is optional where truncation compensation is not. The rounding 
method used adds a constant so that it is 50 percent high (0.25 x 2~" 48 ; 
high) 38 percent of the time and 25 percent low (0.125 x 2~ 48 ; low) 62 
percent of the time resulting in near zero average rounding error. In a 
full-precision rounded multiply, 2 round bits are entered into the 
pyramid at bit position 2~50 and 2"~51 anc : allowed to propagate up the 
pyramid. 

For a half-precision multiply, round bits are entered into the pyramid at 
bit positions 2~^2 an( j 2~31. A carry resulting from this entry is 
allowed to propagate up and the 29 most significant bits of the 
normalized result are transmitted back. 

The variation due to this truncation and rounding are in the range: 

-0.23 x 2~ 48 to +0.57 x 2" 48 

or 

-8.17 x 10" 16 to +20.25 x 10~ 16 

With a full 96-bit pyramid and rounding equal to one-half the least 
significant bit, the variation would be expected to be: 

-0.5 x 2" 48 to +0.5 x 2" 48 



Division algorithm 

The system performs floating-point division through reciprocal 
approximation, facilitating hardware implementation of a fully segmented 
functional unit. Because of this segmentation, operands enter the 
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reciprocal unit during each CP. In vector mode, results are produced at 
a 1-CP rate and are used in other vector operations during chaining 
because all functional units in the system have the same result rate. 
The reciprocal approximation is based on Newton's method. 

Newton ' s method - The division algorithm is an application of Newton's 
method for approximating the real roots of an arbitrary equation 
F(x) = 0/ for which F(x) must be twice dif ferentiable with a continuous 
second derivative. The method requires making an initial approximation 
(guess), xq, sufficiently close to the true root, x t , being sought 
(refer to figure 4-11). For a better approximation, a tangent line is 
drawn to the graph of y = F(x) at the point (xg, F(xg)). The X 
intercept of this tangent line is the better approximation x^. This 
can be repeated using x^ to find X2, and so on. 



y=F(x) 



(x »f(x Q )) 




lo*« 



Figure 4-11. Newton's Method 

Derivation of the division algorithm 

2 definition for the derivative F'(x) of a function F(x) at point x t is 

F'(x t ) = limit F(x) - F(x t ) 

x - x t x - x t 

if this limit exists. If the limit does not exist, F(x) is not 
dif ferentiable at the point t. 
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For any point x^ near to x t , 

F'(x t ) z 1 ^_ where ~ approximately equal to. 



X; - Xi 



This approximation improves as x^ approaches x t . Let x^ stand for 
an approximate solution and let x t stand for the true answer being 
sought. The exact answer is then the value of x that makes F(x) equal 
0. This is the case when x=x t , therefore F(x t ) in the equation above 
can be replaced by 0, giving the following approximation: 

F'(x4.) ~ F ( x i> 

c ~ Approximation (1) 

x i" x t 

x t - x^ is the correction applied to an approximate answer, x^, to 
give the right answer since x^ + (x t - x^) equals x t . Solving 
approximation (1) for (x t - x^) gives: 

x t - xj[ = correction - F ( x i' / 

F'(x t ) 

that is, - F ( x j) is the approximate correction. 
F'(x t ) 

If this quantity is substituted into the approximation, then: 

x t ~ ^ x i + approximate correction) = x^ + ^. 
This gives the following equation: 

x. „ = x. F(x^) f Equation (1) 

l + l i 

F'( Xi ) 

where *i + i is a better approximation than x^ to the true value, x t , 
being sought. The exact answer is generally not obtained at once because 
the correction term is not generally exact. The operation is repeated 
until the answer becomes sufficiently close for practical use. 

To make use of Newton's method to find the reciprocal of a number B, 
simply use F(x) = (1/x - B). 

First calculating F'(x) where: 

F'(x) = ( - - B)' = ( 4)' f ° r any P ° int X l * °' 

X X 42 

F'(x ) =" — - * Choosing for x, a value near — — 
1 v 2 B 

X l 
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and applying equation (1), 



Xo = Xi - 



1_ 
x l 



2 Z 1 D X 

x = x + x ( — - B), 

z x x l 



X_ = X- + X.. — X.B, 

2 11 1 



x^ = 2x„ - x B = x„(2-x„B). 
2 11 11 



On the system, x^ times the quantity in parentheses is performed by a 
floating-point multiply. 2-x^B is performed by the reciprocal 
approximation instruction. x^ is the x near 1/B and is formed by the 
half-precision reciprocal approximation instruction. 

This approximation technique using Newton's method is implemented in the 
system. A hardware table look up provides an initial guess, xq, to 
start the process. 

xg(2 - xqB) 1st approximation, II 

x^(2 - x^B) 2nd approximation, 12 

x 2 (2 - x 2 B) 3rd approximation, 13 

X3<2 - X3B) 4th approximation 



Done in reciprocal unit 



Done with software 



The system's Reciprocal Approximation functional unit performs three 
iterations: II, 12, and 13. II is accurate to 8 bits and is found after 
a table lookup to choose the initial guess, xq. 12 is the second 
iteration and is accurate to 16 bits. 13 is the final (third) iteration 
answer of the Reciprocal Approximation functional unit, and its result is 
accurate to 30 bits. 

A fourth iteration uses a special instruction within the Floating-point 
Multiply functional unit to calculate the correction term. This 
iteration is used to increase accuracy of the reciprocal unit's answer to 
full precision. A fifth iteration should not be done. 
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The division algorithm that computes S1/S2 to full-precision requires the 
following operations: 



Operation 
S3 = 1/S2 



Performed By 

Reciprocal Approximation functional unit 



S4 = (2 - (S3 * S2)) Floating-point Multiply functional unit in 

iteration mode 



S5 = S4 * S3 



S6 = S5 * SI 



Floating-point Multiply functional unit using 
full-precision; S5 now equals 1/S2 to 48-bit 
accuracy. 

Floating-point Multiply functional unit using 
full-precision rounded 



The reciprocal approximation at step 1 is correct to 30 bits. An 
additional Newton iteration (fourth iteration) at operations 2 and 3 
increases this accuracy to 48 bits. This iteration answer is applied as 
an operand in a full-precision rounded multiply operation to obtain the 
quotient accurate to 48 bits. Additional iterations should not be 
attempted since erroneous results are possible. 



****************************************************** 

CAUTION 

The reciprocal iteration is designed for use once with 
each half-precision reciprocal generated. If the 
fourth iteration (the programmed iteration) results in 
an exact reciprocal or if an exact reciprocal is 
generated by some other method, performing another 
iteration results in an incorrect final reciprocal. 

******************************************************* 



Where 29 bits of accuracy are sufficient, the reciprocal approximation 
instruction is used with the half-precision multiply to produce a 
half -precision quotient in only two operations. 



Operation 
S3 = 1/S2 
S6 = SI * S3 



Performed By 

Reciprocal Approximation functional unit 

Floating-point Multiply functional unit in 
half -precis ion 
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The 19 low-order bits of the half-precision results are returned as zeros 
with a rounding applied to the low-order bit of the 29-bit result. 

Another method of computing divisions is as follows: 

Operation Performed By 

53 = 1/S2 Reciprocal Approximation functional unit 

55 = SI * S3 Floating-point Multiply functional unit 

54 = (2 - (S3 * S2)) Floating-point Multiply functional unit 

56 = S4 * S5 Floating-point Multiply functional unit 

A scalar quotient is computed in 29 CPs since operations 2 and 3 issue in 
successive CPs. With this method, the correction to reach a 
full-precision reciprocal is applied after the numerator is multiplied 
times the half -precision reciprocal rather than before. 

A vector quotient using this procedure requires less than four vector 
times since operations 1 and 2 are chained together. This overlaps one 
of the multiply operations. (A vector time is 1 CP for each element in 
the vector. ) 

**************************************************** 

CAUTION 

The coefficient of the reciprocal produced by the 
alternate method can be as much as 2 x 2~48 different 
from the first method described for generating 
full-precision reciprocals. This difference can occur 
because one method can round up as much as twice while 
the other method may not round at all. One round can 
occur while the correction is generated and the second 
round can occur when producing the final quotient. 

Therefore, if the reciprocals are to be compared, the 
same method should be used each time the reciprocals 
are generated. Cray FORTRAN (CFT) uses a consistent 
method and ensures the reciprocals of numbers are 
always the same. 

******************************************************* 
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For example, two 64-element vectors may be divided in 3 * 64 CPs plus 
overhead. (The overhead associated with the functional units for this 
case is 38 CPs. ) 



LOGICAL OPERATIONS 

Scalar and vector logical units perform bit-by-bit manipulation of 64-bit 
quantities. Operations provide for forming logical products, 
differences, sums, and merges. 

A logical product is the AND function: 

Operand 1 10 10 
Operand 2 110 
Result 10 

A logical sum is the inclusive OR function: 

Operand 1 10 10 
Operand 2 110 
Result 1110 

A logical difference is the exclusive OR function: 

Operand 1 10 10 
Operand 2 110 
Result 110 

A logical equivalence is the exclusive NOR function: 

Operand 1 10 10 
Operand 2 110 
Result 10 1 

The merge uses two operands and a mask to produce results as follows: 

Operand 1 10101010 

Operand 2 11001100 

Mask 11110000 

Result 10101100 

The bits of operand 1 pass where the mask bit is 1. The bits of operand 
2 pass where the mask bit is 0. 
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CPU INSTRUCTIONS 



This section explains the instruction formats and the specific 
instructions for the CRAY X-MP four-processor computer systems, 



INSTRUCTION FORMAT 

Each instruction used in the computer is either a 1-parcel (16-bit) 
instruction or a 2-parcel (32-bit) instruction. Instructions are packed 
4 parcels per word. Parcels in a word are numbered through 3 from left 
to right and any parcel position can be addressed in branch 
instructions. A 2-parcel instruction begins in any parcel of a word and 
can span a word boundary. For example, a 2-parcel instruction beginning 
in the fourth parcel of a word ends in the first parcel of the next 
word. No padding to word boundaries is required. Figure 5-1 illustrates 
the general form of instructions. 



First Parcel Second Parcel 



g h i j k m 

Bits 



4 1 3 | 3 | 3 | 3 1 16_ 



Figure 5-1. General Form for Instructions 



Four variations of this general format use the fields differently; two 
forms are 1-parcel formats and two are 2-parcel formats. The formats of 
these four variations are described below. 



1-PARCEL INSTRUCTION FORMAT WITH DISCRETE j AND k FIELDS 

The most common of the 1-parcel instruction formats uses the i, j, 
and k fields as individual designators for operand and result registers 
(refer to figure 5-2). The g and h fields define the operation 
code. The i field designates a result register and the j and k 
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fields designate operand registers. Some instructions ignore one or more 
of the i, j, and k fields. The following types of instructions use 
this format: 

• Arithmetic 

• Logical 

• Double shift 

• Floating-point constant 



J * 



4 | 3 | 3 | 3 | 3 



Bits 



Operation Register 
Code Designators 



Figure 5-2. 1-parcel Instruction Format with Discrete 
j and k Fields 



1-PARCEL INSTRUCTION FORMAT WITH COMBINED j AND k FIELDS 

Some 1-parcel instructions use the j and k fields as a combined 6-bit 
field (refer to figure 5-3). The g and h fields contain the 
operation code, and the i field is generally a destination register 
identifier. The combined j and k fields generally contain a constant 
or a B or T register designator. The branch instruction 005 and the 
following types of instructions use the 1-parcel instruction format with 
combined j and k fields. 



Constant 

B and T register block memory transfer 

B and T register data transfer 

Single shift 

Mask 



J* 



4 3 3 6 



Operation 
Code 



Bits 



Result Constant or 
Register Register 
Designator 



Figure 5-3. 1-parcel Instruction Format with Combined 
j and k Fields 
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2 -PARCEL INSTRUCTION FORMAT WITH COMBINED j, k, AND m FIELDS 

The instruction type for a 22-bit immediate constant uses the combined 
j, k, and m fields to hold the constant. The 7-bit gh field contains 
an operation code, and the 3-bit i field designates a result register. 
The instruction type using this format transfers the 22-bit jkm constant 
to an A or S register. 

The instruction type used for scalar memory transfers also requires a 
22-bit jkm field for an address displacement. This instruction type uses 
the 4-bit g field for an operation code, the 3-bit h field to designate 
an address index register, and the 3-bit i field to designate a source or 
result register. (Refer to the subsection on Special Register Values.) 

Figure 5-4 shows the two general applications for the 2-parcel instruction 
format with combined j, k, and m fields. 



First Parcel 



Second Parcel 



h i j k 



m 



3|3| | | 22 



Bits 



Operation Result 
Code Register 



Constant 



First Parcel 



Second Parcel 



g h i j k 



m 



I 4 | : 


* 1 • 


M 1 1 22 




* 1 


i > 


i 




Operation 
Code 


Address or 
Displacement 





Bits 



Address Source or 
Register Result Register 
Used as 
Index 



Figure 5-4. 2-parcel Instruction Format with Combined 
j, k, and m Fields 
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2-PARCEL INSTRUCTION FORMAT WITH COMBINED i, j, *, AND m FIELDS 

The 2 -parcel instruction type for a branch (figure 5-5) uses the combined 
i, j, k, and m fields to contain the 24-bit address that allows 
branching to an instruction parcel. A 7-bit operation code (gh) is 
followed by an ijkm field. The high-order bit of the i field is clear. 

The 2-parcel instruction type for a 24-bit immediate constant 
(figure 5-6) uses the combined i, j, k, and m fields to hold the 
constant. This instruction type uses the 4-bit g field for an 
operation code and the 3 -bit h field to designate the result address 
register. The high-order bit of the i field is set. 



First Parcel 



Second Parcel 



m 



4 I 3 1 1 



Operation 1 
Code Clear 
Bit 



22 



Address 



Bits 



Parcel 
Select 



Figure 5-5. 2-parcel Instruction Format for a Branch with 
Combined i, j, k, and m Fields 



First Parcel 



Second Parcel 



h i j k 



m 



1 4 | 3 111 I 


1 1 24 




1 > 
Operation 

Code 


1 1 ~ 




^ 


T 

1 

Set 
Bit 


Constant 





Bits 



Result 
Register 



Figure 5-6. 2-parcel Instruction Format for a 24-bit Immediate 
Constant with Combined i, j, k, and m Fields 
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SPECIAL REGISTER VALUES 

If the SO and AO registers are referenced in the J or k fields of an 
instruction, the respective register contents are not used; instead/ a 
special operand is generated. The special value is available regardless 
of existing AO or SO reservations (which, in this case, are not 
checked) . This use does not alter the actual value of the SO or AO 
register. If SO or AO is used in the i field as the operand, the 
actual value of the register is provided. Table 5-1 shows the special 
register values. 



Table 5-1. Special Register Values 



Field 


Operand Value 


Ah, h=0 





Ai, i=0 


(AO) 


Aj, j=0 
A*, k=0 



1 


Si, i=0 


(SO) 


Sj, j=0 
Sk, k=0 



2 63 



INSTRUCTION ISSUE 

Instructions are read 1 parcel at a time from the instruction buffers and 
delivered to the Next Instruction Parcel (NIP) register. The instruction 
is then passed to the Current Instruction Parcel (CIP) register when the 
previous instruction issues. An instruction in the CIP register issues 
when conditions in the functional unit and registers are such that 
functions required for execution can be performed without conflicting 
with a previously issued instruction. Instruction parcels can issue out 
of the CIP register at a maximum rate of one per clock period (CP). 

Execution times (the time from issue to delivery of data to the 
destination operating registers) are fixed for instructions 000 through 
077, except those that reference memory (instructions 000, 004, branch 
instructions 005 through 017, and block transfer instructions 034 through 
037). Scalar memory instructions 100 through 137 complete in variable 
lengths of time. Vector operation instructions 140 through 177 complete 
in a fixed time if the instructions are not chained to memory fetches. 
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Execution times can be affected by instruction 0034jfc, which tests and 
sets the semaphore designated by jk. If the semaphore is set, 
instruction issue is held until another CPU clears that semaphore. If 
the semaphore is clear, the instruction issues and sets the semaphore. 
If all CPUs in a cluster are holding issue on a test and set, a flag is 
set in the Exchange Package (if not in monitor mode) and an exchange 
occurs. If an interrupt occurs while a test and set instruction is 
holding in the CIP register, a flag is set in the Exchange Package, CIP 
and NIP registers clear, and an exchange occurs with the P register 
pointing to the test and set instruction. 

Entry to the NIP register is blocked for the second parcel of a 2-parcel 
instruction, leaving NIP blanked. Instead, the parcel is delivered to 
the Lower Instruction Parcel (LIP) register. The zeros in NIP (the 
pseudo second parcel) are transferred to CIP and issued as a do-nothing 
instruction. 

When special register values (AO or SO) are selected by an instruction 
for Aft, Aj, kk, Sj, or Sk, the normal hold issue until operand 
ready conditions do not apply. These values are always immediately 
available. 



INSTRUCTION DESCRIPTIONS 

This section contains detailed information about individual instructions 
or groups of related instructions. Each instruction begins with boxed 
information consisting of the Cray Assembly Language (CAL) syntax format, 
a brief description of each instruction, and the octal code sequence 
defined by the gh fields. The appearance of an m in a format 
designates an instruction consisting of 2 parcels. 

Following the boxed information is a more detailed description of the 
instruction or instructions, including a list of hold issue conditions, 
execution time, and special cases. Hold issue conditions refer to those 
conditions delaying issue of an instruction until conditions are met. 

Instruction issue time assumes that if an instruction issues at CP n, 
the next instruction issues at CP n + issue time* if its own issue 
conditions have been met. 



f Previous instruction issued 
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The following special characters can appear in the operand field 
description of symbolic machine instructions and are used by the 
assembler in determining the operation to be performed. 

Character Description 

+ Arithmetic sum of adjoining registers 

Arithmetic difference of adjoining registers 

* Arithmetic product of adjoining registers 
/ Division or reciprocal 

# Use ones complement 

> Shift value or form mask from left to right 

< Shift value or form mask from right to left 

& Logical product of adjoining registers 

! Logical sum of adjoining registers 

\ Logical difference of adjoining registers 

In some instructions, register designators are prefixed by the following 
letters, which have special meaning to the assembler. 

Letter Description 

F Floating-point operation 

H Half-precision operation 

R Rounded operation 

I Reciprocal iteration 

P Population count 

Q Population count parity 

Z Leading zero count 



******************************************************* 

CAUTION 

Instructions with g, h, i, j, k, and m fields not 
explicitly described in the following instructions may 
produce indeterminate results. 

******************************************************* 
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INSTRUCTION 000 



CAL Syntax Description Octal Code 



ERR Error exit 000000 



Instruction 000 is treated as an error condition and an exchange sequence 
occurs. Content of the instruction buffers is voided by the exchange 
sequence. Instruction 000 halts execution of an incorrectly coded 
program branching into an unused area of memory (if memory was 
backgrounded with zeros) or into a data area (if the data is positive 
integers, right-justified ASCII, or floating-point zero). If monitor 
mode is not in effect, the Error Exit flag in the F register is set. All 
instructions issued before this instruction are run to completion. When 
results of previously issued instructions arrive at the operating 
registers, an exchange occurs to the Exchange Package designated by the 
Exchange Address (XA) register contents. The program address stored 
during the exchange on the terminating exchange sequence is the contents 
of the P register advanced by one count (that is, the address of the 
instruction following the error exit instruction) . 



HOLD ISSUE CONDITIONS: Any A, S, or V register reserved 

EXECUTION TIME: Instruction issue, 40 CPs; this time includes an 

exchange sequence (24 CPs) and a fetch operation 
(16 CPs). 

SPECIAL CASES: None 
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INSTRUCTIONS 0010 - 0013 



CAL Syntax 


Description 


Octal Code 


CA,AJ kk 


Set the Current Address (CA) register for the 
channel indicated by (Aj) to (kk) and activate 
the channel 


ooiojfc 


CL. kj A* 


Set the Limit Address (CL) register for the 
channel indicated by (Aj) to (kk) 


OOlljfc 


CI. hj 


Clear the interrupt flag and error flag for 
the channel indicated by (Aj); clear device 
master-clear (output channel). 


0012J0 


UC.kj 


Clear the interrupt flag and error flag for 
the channel indicated by (Aj); set device 
master-clear (output channel); clear device 
ready-held (input channel). 


0012J1 


XA Aj 


Enter the XA register with (Aj) 


0013J0 



Instructions 0010 through 0013 are privileged to monitor mode and provide 
operations useful to the operating system. Functions are selected 
through the i designator. Instructions are treated as pass 
instructions if the monitor mode bit is not set. 

When the i designator is 0, 1, or 2, the instruction controls operation 
of the I/O channels. Each channel has two registers directing the 
channel activity. The CA register for a channel contains the address of 
the current channel word. The CL register specifies the limit address. 
In programming the channel, the CL register is initialized first and then 
CA sets, activating the channel. As transfer continues, CA is 
incremented toward CL. When (CA) is equal to (CL), transfer is complete 
for words at initial (CA) through (CL) - 1. When the j designator is 
or when the 5 low-order bits of Aj are less than 63, the functions 
are executed as pass instructions. Valid channel numbers are 6 through 
17 3. When the k designator is 0, CA or CL is set to 1. 

When the i designator is 3, the instruction transmits bits 2 11 
through 2 4 of (Aj) to the XA register. When the j designator is 0, 
the XA register is cleared. 

Instruction 0012j0 is used to clear the device Master Clear. For 
instruction 0012, if the k designator is 1 for an output channel, the 
master clear is set; if the k designator is 1 for an input channel, the 
Ready flag is cleared. 
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INSTRUCTIONS 0010 - 0013 (continued) 

HOLD ISSUE CONDITIONS: For instructions 0010 and 0011, kj or hk 

reserved (except A0) 

For instructions 0012 or 0013, kj reserved 
(except A0) 



EXECUTION TIME: 
SPECIAL CASES: 



Instruction issue, 1 CP 

If the program is not in monitor mode, the 
instruction becomes a no-op although all hold 
issue conditions remain effective. 



For instructions 0010, 0011, and 0012: 
If 7=0, the instruction is a no-op. 
If k=0, CA or CL is set to 1. 
If 5 low-order bits of (Aj) are less than 
63, the instruction is a no-op. If the 5 
low-order bits of (kj) are greater than 
17g, undetermined results can occur. (That 
is, 6g through 17g are valid, 20g through 
37g are undetermined, 46g through 57g are 
valid, and so on.) 

For instruction 0012: 

The correct priority interrupting channel 
number cannot be read (through instruction 033) 
until 6 CPs after issue of instruction 0012. 

For instruction 0013: 

If j=0, XA register is cleared. 



NOTE 

Because there is no hardware interlock among 
CPUs, it is possible to have more than one CPU 
issuing these instructions at the same time; 
however, undetermined results occur. 

Software must ensure only one CPU is servicing 
I/O at a time while in monitor mode. 
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INSTRUCTION 0014 



CAL Syntax 


Description 


Octal Code 


RT Sj 


Enter the Real-time Clock (RTC) register 
with (Sj) 


00 14 jO 


IP,jl 


Set interprocessor interrupt request of CPUj 


0014J1 


IP 


Clear received interprocessor interrupt 
request from all other processors 


001402 


CLN 


Cluster number = 


001403 


CLN 1 


Cluster number = 1 


001413 


CLN 2 


Cluster number = 2 


001423 


CLN 3 


Cluster number = 3 


001433 


CLN 4 


Cluster number = 4 


001443 


CLN 5 


Cluster number = 5 


001453 


PCI Sj 


Enter Interrupt Interval (II) register 
with (Sj) 


0014J4 


CCI 


Clear the programmable clock interrupt request 


001405 


ECI 


Enable programmable clock interrupt request 


001406 


DC I 


Disable programmable clock interrupt request 


001407 



Instruction 0014 performs specialized functions for managing the 
real-time and programmable clocks and handles interprocessor interrupt 
requests and cluster number operations. Instruction 0014 is privileged 
to monitor mode and is treated as a pass instruction if the monitor mode 
bit is not set. 

When the k designator is 0, the instruction loads the Sj register 
contents into the RTC register. When the j designator is or 
(Sj)=0, the RTC register is cleared. 
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INSTRUCTION 0014 (continued) 

When the k designator is 1, the instruction sets the internal CPU 
interrupt request in the CPU associated with PN=j. If the CPU 
associated with PN=j is not in monitor mode, the Interrupt from 
Internal CPU (ICP) flag sets in the F register causing an interrupt. 
The request remains until cleared by the receiving CPU issuing 
instruction 001402. If the CPU associated with PN=j attempts to 
interrupt itself, the instruction becomes a no-op. 

When the k designator is 2, the instruction clears the internal CPU 
interrupt request set by any other CPU. 

When the k designator is 3, the instruction sets the cluster number to 
j to make the following cluster selections: 

CLN = No cluster; all shared register and semaphore operations 
are no-ops, (except SB, ST, or SM register reads, which 
return a value to Ai or Si). 

CLN = 1 Cluster 1 

CLN = 2 Cluster 2 

CLN = 3 Cluster 3 

CLN = 4 Cluster 4 

CLN = 5 Cluster 5 

Clusters 1, 2, 3/4, and 5 each have a separate set of SM, SB, and 
ST registers. 

When the k designator is 4, the instruction loads the low-order 32 
bits from the Sj register into both the II register and the Interrupt 
Countdown (ICD) counter. When the j designator is or (Sj)=0, II 
and ICD are cleared. 

When the k designator is 5, the instruction clears the programmable 
clock interrupt request if the request is previously set by ICD counting 
down to . 

When the k designator is 6, the instruction enables repeated 
programmable clock interrupt requests at a repetition rate determined by 
the value stored in the II register. 

When the k designator is 7, the instruction disables repeated 
programmable clock interrupt requests until an instruction 001406 is 
executed to enable the requests. 



HR-0097 5-12 



INSTRUCTION 0014 (continued) 

HOLD ISSUE CONDITIONS: Sj reserved (except SO) 

For instruction 0014J3, hold issue 2+T CPs 

EXECUTION TIME: Instruction issue, 1 CP 

SPECIAL CASES: If the program is not in monitor mode, these 

instructions become no-ops but all hold issue 
conditions remain effective. 

For instructions 0014J0 and 0014J4, if j=0, 
(Sj)=0. 

For instruction 0014 JO, the value is entered 
into the RTC register 4 CPs after instruction 
0014j0 issues. 

For instruction 0014J1, if the processor number 
equals j of the CPU issuing this instruction, 
the instruction becomes a no-op. (A CPU cannot 
interrupt itself if j equals the processor 
number of the CPU issuing this instruction.) 



If more than one CPU attempts to access semaphores or shared registers 
in the same CP, a scanner resolves the conflict. Refer to shared 
register explanation in section 2. 
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INSTRUCTION 0015 



CAL Syntax 


Description 


Octal Code 


t 


Select performance monitor 


0015J0 


t 


Set maintenance read mode 


001501 


t 


Load diagnostic check byte with SI 


001511 


t 


Set maintenance write mode 1 


001521 


t 


Set maintenance write mode 2 


001531 



f Not currently supported 



These instructions are all privileged to monitor mode. 

Instruction 0015J0 selects one of four groups of hardware related 
events to be monitored by the performance counters. Refer to appendix C 
for a description of how performance monitoring is accomplished. 

Instructions 001501 through 001531 are used to check the operation of the 
modules concerned with SECDED and to verify error detection and 
correction. The maintenance mode switch on the mainframe's control panel 
must be switched on during execution of these instructions or they become 
no-ops. Refer to appendix D for a description of SECDED maintenance mode 
functions. 

Instructions 001501 and 001521 are used to verify check bit memory 
storage. Instruction 001501 allows the 8 check bits for SECDED to 
replace certain data bit positions in any subsequent memory read for the 
CPU path (including fetch and I/O). Instruction 001521 allows certain 
write data bits to replace the 8 check bits for SECDED for any subsequent 
CPU write to memory. 

Instructions 001511 and 001531 are used to verify error detection and 
correction. Instruction 001511 loads a diagnostic check byte with the 
high order 8 bits of SI. Instruction 001531 enables a diagnostic check 
byte to replace the 8 check bits for SECDED being written into memory for 
any subsequent write to memory. 
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INSTRUCTION 0020 



CAL Syntax 


Description 


Octal Code 


VL A* 
VL it 


Transmit (kk) to Vector Length (VL) register 
Transmit 1 to VL register 


00200* 
002000 



f Special CAL syntax 



Instruction 00200* enters the VL register with a value determined by 
the contents of A*. The low-order 6 bits of (A*) are entered into 
the VL register. The 7th bit of VL is set if the 6 low-order bits of 
(A*)=0. 

For example, if (A&)=0 or a multiple of 100 3 , then VL=100 8 . The 
content of VL is always between 1 and IOO3. 

Instruction 002000 transmits the value of 1 to the VL register. 



HOLD ISSUE CONDITIONS; 
EXECUTION TIME: 



A* reserved (except AO) 

Instruction issue, 1 CP 
VL register ready, 1 CP 



SPECIAL CASES: 



Maximum vector length is 64. 
(A*)=l if *=0. 

(VL)=100 8 if k*0 and (A*)=0 or a 
multiple of IOO3. 
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INSTRUCTIONS 0021 - 0027 



CAL 


Syntax 


Description 


Octal Code 


EFI 




Enable interrupt on floating-point error 


002100 


DFI 




Disable interrupt on floating-point error 


002200 


ERI 




Enable interrupt on operand (address) 
range error 


002300 


DRI 




Disable interrupt on operand (address) 
range error 


002400 


DBM 




Disable bidirectional memory transfers 


002500 


EBM 




Enable bidirectional memory transfers 


002600 


CMR 




Complete memory references (CMR) 


002700 



Instruction 002100 sets the Floating-point Mode flag in the M register. 
Instruction 002200 clears the Floating-point Mode flag in the M 
register. The two instructions do not check the previous state of the 
flag. When set, the Floating-point Mode flag enables interrupts on 
floating-point range errors as described in section 4. Issuing either of 
these instructions also clears the Floating-Point Error Status flag. 

Instruction 002300 sets the Operand Range Mode flag in the M register. 
Instruction 002400 clears the Operand Range Mode flag in the M register. 
The two instructions do not check the previous state of the flag. When 
set, the Operand Range Mode flag enables interrupts on operand (address) 
range errors as described in section 3. 

Instruction 002500 disables the bidirectional memory mode. Instruction 
002600 enables the bidirectional memory mode. Block reads and writes can 
operate concurrently in bidirectional memory mode. If the bidirectional 
memory mode is disabled, only block reads can operate concurrently. 

Instruction 002700 assures completion of all memory references within a 
particular CPU issuing the instruction. Instruction 002700 does not 
issue until all memory references before this instruction are at the 
stage of execution where completion occurs in a fixed amount of time. 
For example, a load of any data that has been stored by the CPU issuing 
instruction CMR, 002700, is assured of receiving the updated data if the 
load is issued after the CMR instruction. Synchronization of memory 
references between processors can be done by this instruction in 
conjunction with semaphore instructions. 
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EXECUTION TIME: 
SPECIAL CASES: 



INSTRUCTIONS 0021 - 0027 (continued) 
HOLD ISSUE CONDITIONS: Instructions 002500 and 002600, hold issue 2 CPs 

Instruction 002700, Ports A, B, and C busy 



Instruction 002700, scalar memory reference 
active in CP 1, 2, or 3 

Ak reserved (except A0) 

Instruction issue, 1 CP 

Instructions 002100 and 002200 are issued even 
if there are other floating-point operations in 
process resulting from previous issues. The 
interrupts are enabled or disabled at CP + 1; 
floating-point overflows occurring after that 
time cause interrupts if they are enabled even 
if the overflow is generated by a previously 
issued floating-point instruction. 

Instructions 002300 and 002400 are issued even 
if there are other memory references in process 
resulting from previous issues. The interrupts 
are enabled or disabled at CP + 1; operand range 
errors occurring after that time cause 
interrupts if they are enabled even if the 
operand range error is generated by a previous 
memory reference. 



HR-0097 



5-17 



INSTRUCTIONS 0030, 0034, 0036, and 0037 



CAL Syntax 


Description 


Octal Code 


VM Sj 


Transmit (Sj) to VM register 


00 30 JO 


VM ot 


Clear VM register 


003000 


SM jk 1 , TS 


Test and set semaphore jk, <_ jk <_ 31^q 


0034J* 


SMjk 


Clear semaphore jk, £ jk £ 31^q 


0036J* 


SMjk 1 


Set semaphore jk, <_ jk £ 31^o 


0037J& 



f Special CAL syntax 



Instruction 0030 JO enters the VM register with the contents of Sj. 
The VM register is cleared if the j designator is in instruction 
003000. These instructions are used in conjunction with the vector merge 
instructions (146 and 147) in which an operation is performed depending 
on the contents of VM. 

Instruction 0034J& tests and sets the semaphore designated by jk. If 
the semaphore is set, issue is held until the other CPU clears that 
semaphore. If the semaphore is clear, the instruction issues and sets 
the semaphore. If all CPUs in a cluster are holding issue on a test and 
set, the DL flag is set in the Exchange Package (if not in monitor mode) 
and an exchange occurs. If an interrupt occurs while a test and set 
instruction is holding in the CIP register, the WS flag in the Exchange 
Package sets, CIP and NIP registers clear, and an exchange occurs with 
the P register pointing to the test and set instruction. The SM register 
is 32 bits with SM0 being the most significant bit. 

Instruction 0036J& clears the semaphore designated by jk. 

Instruction 0037 jk sets the semaphore designated by jk. 



HOLD ISSUE CONDITIONS: For instruction 0030 JO: 

Sj reserved (except SO) 

Instruction 003 in process, unit busy 1 CP 

Instruction 14x in process, unit busy (VL) + 

5 CPs 

Instruction 175 in process, unit busy (VL) + 

5 CPs 



HR-0097 5-18 



INSTRUCTIONS 0030, 0034, 0036, and 0037 (continued) 
HOLD ISSUE CONDITIONS: For instructions 0034 jk, 0036 jk, and 



(continued) 



EXECUTION TIME: 
SPECIAL CASES: 



0037j*fc: 

Hold issue 1+ Cpt 

For instruction 0034jfc: 

If current Cluster Number/0 and SMjfc is 
set, holds issue until other CPU in the same 
cluster clears the semaphore. 

Instruction issue, 1 CP 

(SJ)=0 if J=0. 

Instructions 0034jfc, 0036jfc, and 0037jk 
are no-ops if CLN=0. 



f If more than one CPU attempts to access semaphores or shared registers 
in the same CP, a scanner resolves the conflict. Refer to shared 
register explanation in section 2. 
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INSTRUCTION 004 



CAL Syntax 



Description 



Octal Code 



EX 



Normal exit 



004000 



Instruction 004 causes an exchange sequence which voids the contents of 
the instruction buffers. If monitor mode is not in effect, the Normal 
Exit flag in the F register is set. All instructions issued before this 
instruction are run to completion; that is, when all results arrive at 
the operating registers because of previously issued instructions, an 
exchange sequence occurs to the Exchange Package designated by the XA 
register contents. The program address stored into the Exchange Package 
is advanced one count from the address of the normal exit instruction. 
Instruction 004 is used to issue a monitor request from a user program. 



HOLD ISSUE CONDITIONS: 
EXECUTION TIME: 



Any A, S, or V register reserved 

Instruction issue, 40 CPs; this time includes an 
exchange sequence (24 CPs) and a fetch operation 
(16 CPs). 



SPECIAL CASES: 



None 
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INSTRUCTION 005 



CAL Syntax 


Description 


Octal Code 


J Bjk 


Branch to (Bjk) 


0050J& 



Instruction 005 sets the P register to the 24-bit parcel address 
specified by the contents of Bjk causing execution to continue at that 
address. The instruction is used to return from a subroutine. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SPECIAL CASES: 



Instruction 034 or 035 in process 

Instruction 025 issued in the previous CP 

Second parcel in a different buffer, 2-CP delay 

Second parcel not in a buffer 

Instruction issue: 

Instruction parcel and following parcel both 
in a buffer and branch address in a buffer, 
7 CPs 

Instruction parcel and following parcel both 
in a buffer and branch address not in a 
buffer, 18 CPs. Additional time is needed if 
a memory conflict exists: the time to resolve 
a memory conflict depends on factors present. 

Instruction 0050J& executes as if it were a 
2-parcel instruction. Even though the parcel 
following the first parcel of instruction 
0050J& is not used, it can cause a delay of 
instruction 0050J& if it is out of buffer. 
Refer to execution times. 
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INSTRUCTION 006 



CAL Syntax 




Description 


Octal Code 


J exp 


Branch to ijkm 




006ijkm 



The 2-parcel instruction 006 sets the P register to the parcel address 
specified by the low-order 24 bits of the ijkm field. Execution 
continues at that address. The high-order bit of the ijkm field is 
ignored. 



HOLD ISSUE CONDITIONS: 



Second parcel in different buffer, 2-CP delay 
Second parcel not in a buffer 



EXECUTION TIME: 



Instruction issue: 

Both parcels of instruction in the same buffer 
and branch address in a buffer, 5 CPs 



Both parcels of instruction in the same buffer 
and branch address not in a buffer, 16 CPs. 
Additional time is needed if a memory conflict 
exists. The time to resolve a memory conflict 
depends on factors present. 



SPECIAL CASES: 



None 
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INSTRUCTION 007 



CAL Syntax 


Description 


Octal Code 


R exp 


Return jump to ijkm; set BOO to (P) + 2. 


007 ijkm 



The 2-parcel instruction 007 sets register BOO to the address of the 
parcel following the second parcel of the instruction. The P register is 
then set to the parcel address specified by the low-order 24 bits of the 
ijkm field. Execution continues at that address. The high-order bit 
of the ijkm field is ignored. This instruction provides a return 
linkage for subroutine calls. The subroutine is entered through a return 
jump. The subroutine can return to the caller at the instruction 
following the call by executing a branch to the BOO register contents. 



HOLD ISSUE CONDITIONS; 



Instruction 034 or 035 in process 

Second parcel in a different buffer, 2-CP delay 

Second parcel not in a buffer 



EXECUTION TIME: 



Instruction issue: 

Both parcels of instruction in the same buffer 
and branch address in a buffer, 5 CPs 



Both parcels of instruction in the same buffer 
and branch address not in a buffer, 16 CPs. 
Additional time is needed if a memory conflict 
exists. The time to resolve a memory conflict 
depends on factors present. 



SPECIAL CASES: 



None 
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INSTRUCTIONS 010 - 013 



CAL 


Syntax 




Description 






Octal Code 


JAZ 


exp 


Branch to ijkm 


if (A0)=0 (i 2 =0) 






010 ijkm 


JAN 


exp 


Branch to ijkm 


if (A0)*0 (i 2 =0) 






011 ijkm 


JAP 


exp 


Branch to ijkm 


if (A0) positive, 


includes 


012 ijkm 






(A0)=0 (i 2 =0) 










JAM 


exp 


Branch to ijkm 


if (A0) negative 


i*2- 


=0) 


013 ijkm 



The 2 -parcel instructions 010 through 013 test the contents of A0 for the 
condition specified by the h field. If the condition is satisfied, the 
P register is set to the parcel address specified by the low-order 24 
bits of the ijkm field and execution continues at that address. The 
high-order bit of the ijkm field must be 0. If the condition is not 
satisfied, execution continues with the instruction following the branch 
instruction. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



A0 busy in any one of the previous 3 CPs 

Second parcel in a different buffer, 2-CP delay 

Second parcel not in a buffer 

Instruction issue for branch taken: 

Both parcels of instruction in the same buffer, 
branch taken, and branch address in a buffer; 
5 CPs. 



Both parcels of instruction in the same buffer, 
branch taken, and branch address not in a 
buffer; 16 CPs. Additional time is needed if a 
memory conflict exists. The time to resolve a 
memory conflict is indeterminate. 

Both parcels of instruction in different 
buffers, branch taken, and branch address in a 
buffer; 7 CPs. 

Both parcels of instruction in different 
buffers, branch taken, and branch address not 
in a buffer; 18 CPs. 
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INSTRUCTIONS 010 - 013 (continued) 

EXECUTION TIME: Second parcel of instruction not in a buffer, 

(continued) branch taken, and branch address in a buffer; 

18 CPs. 

Second parcel of instruction not in a buffer, 
branch taken, and branch address not in buffer; 
29 CPs. 

Instruction issue for branch not taken: 

Both parcels of instruction in the same buffer, 
branch not taken, and next instruction in the 
same instruction buffer; 2 CPs. 

Both parcels of instruction in the same buffer, 
branch not taken, and next instruction in 
different instruction buffer; 4 CPs. 

Both parcels of instruction in the same buffer 
and branch not taken with next instruction in 
memory; 16 CPs. 

Both parcels of instruction in different 
buffers and branch not taken; 4 CPs. 

Second parcel of instruction not in a buffer 
and branch not taken; 15 CPs. 



NOTE 

Whenever a fetch occurs, memory conflicts may produce a 
delay. 



SPECIAL CASES: (A0)=0 is considered a positive condition. 

High-order bit of i designator (12) must be 
0. 
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INSTRUCTIONS 014 - 017 



CAL 


Syntax 






Description 






Octal Code 


JSZ 


exp 


Branch 


to ijkm 


if (S0)=0 (i 2 =0) 






014 ijkm 


JSN 


exp 


Branch 


to ijkm 


if (S0)*0 (i 2 =°> 






015 ijkm 


JSP 


exp 


Branch 


to ijkm 


if (SO) positive. 


includes 


016 ijkm 






(S0)=0 


(i 2 =0) 










JSM 


exp 


Branch 


to ijkm 


if (SO) negative 


lh- 


=0) 


017 ijkm 



The 2-parcel instructions 014 through 017 test the contents of SO for the 
condition specified by the h field. If the condition is satisfied, the 
P register is set to the parcel address specified by the low-order 24 
bits of the ijkm field and execution continues at that address. The 
high-order bit of the ijkm field must be 0. If the condition is not 
satisfied, execution continues with the instruction following the branch 
instruction. 



HOLD ISSUE CONDITIONS 



EXECUTION TIME: 



SO busy in any one of the previous 3 CPs 

Second parcel in a different buffer, 2-CP delay 

Second parcel not in a buffer 

Instruction issue for branch taken: 

Both parcels of instruction in the same buffer, 
branch taken, and branch address in a buffer; 
5 CPs. 



Both parcels of instruction in the same buffer, 
branch taken, and branch address not in a 
buffer; 16 CPs. Additional time is needed if a 
memory conflict exists. The time to resolve a 
memory conflict is indeterminate. 

Both parcels of instruction in different 
buffers, branch taken, and branch address in a 
buffer; 7 CPs. 

Both parcels of instruction in different 
buffers, branch taken, and branch address not 
in a buffer; 18 CPs. 
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INSTRUCTIONS 014 - 017 (continued) 

EXECUTION TIME: Second parcel of instruction not in a buffer, 

(continued) branch taken, and branch address in a buffer; 

18 CPs. 

Second parcel of instruction not in a buffer, 
branch taken, and branch address not in buffer; 
29 CPs. 

Instruction issue for branch not taken: 

Both parcels of instruction in the same buffer, 
branch not taken, and next instruction in the 
same instruction buffer; 2 CPs. 

Both parcels of instruction in the same buffer, 
branch not taken, and next instruction in 
different instruction buffer; 4 CPs. 

Both parcels of instruction in the same buffer 
and branch not taken with next instruction in 
memory; 16 CPs. 

Both parcels of instruction in different 
buffers and branch not taken; 4 CPs. 

Second parcel of instruction not in a buffer 
and branch not taken; 15 CPs. 



NOTE 

Whenever a fetch occurs, memory conflicts may produce a 
delay. 



SPECIAL CASES: (S0)=0 is considered a positive condition. 

High-order bit of i designator (23) must be 
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INSTRUCTION 01ft 



CAL Syntax 


Description 


Octal Code 


Ah exp 


Transmit ijkm to Ah (i 2 =l) 


Olhijkm 



The 2 -parcel instruction 01?i enters a 24-bit value into Ah that is 
composed of the low-order 24 bits of the ijkm field. The high-order 
bit of the ijkm field must be set to distinguish the Olh instruction 
from the 010 through 017 branches. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SPECIAL CASES: 



Ah reserved 

Second parcel not in a buffer 

Second parcel in a different buffer 

Instruction issue: 

Both parcels in same buffer, 2 CPs 

Both parcels in different buffers, 4 CPs 

Ah ready, 1 CP 

High-order bit of i designator (12) must be 1 
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INSTRUCTIONS 020 - 021 



CAL Syntax 


Description 


Octal Code 


Ai exp 
Ai exp 


Transmit jkm to Ai 

Transmit ones complement of jkm to Ai 


20 i j ton 
021 i jkm 



The 2-parcel instruction 020 enters a 24-bit value into Ai composed of 
the 22-bit jkm field and 2 high-order bits of 0. 

The 2-parcel instruction 021 enters a 24-bit value that is the complement 
of a value formed by the 22-bit jkm field and 2 high-order bits of 
into Ai. The complement is formed by changing all 1 bits to and all 
bits to 1. Thus, for instruction 021, the high-order 2 bits of Ai 
are set to 1. The instruction provides a means of entering a negative 
value into Ai. If the instruction is used, however, to enter a 
negative number, the positive number used in the jkm field must be one 
smaller than the absolute value of the expected final negative number. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SPECIAL CASES: 



Ai reserved 

Second parcel not in a buffer 

Instruction issue: 

Both parcels in same buffer, 2 CPs 

Both parcels in different buffers, 4 CPs 

Ai ready, 1 CP 

None 
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INSTRUCTION 022 



CAL Syntax 


Description 


Octal Code 


hi exp 


Transmit jk to hi 


022ijfc 



Instruction 022 enters the 6-bit quantity from the jk field into the 
low-order 6 bits of hi. The high-order 18 bits of hi are zeroed. No 
sign extension occurs. 



HOLD ISSUE CONDITIONS: hi reserved 

EXECUTION TIME: Instruction issue, 1 CP 

hi ready, 1 CP 
SPECIAL CASES: None 
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INSTRUCTION 023 



CAL Syntax 



Description 



Octal Code 



hi Sj Transmit (Sj) to hi 
hi VL Read vector length 



023ij'0 
023101 



Instruction 023ij0 enters the low-order 24 bits of (Sj) into hi. The 
high-order bits of (Sj) are ignored. 

Instruction 023i01 enters the VL register contents into hi. 



HOLD ISSUE CONDITIONS: hi reserved 

For instruction 023ij0, Sj reserved (except 
SO) 



EXECUTION TIME: 



SPECIAL CASES: 



Instruction issue, 1 CP 

hi ready, 1 CP 

(Sj)=0 if j=0. 

If (A1)=0, the sequence: 
VL Al 
A2 VL 
leaves (A2)=100 8 

If (Al)=233, the sequence: 
VL Al 
A2 VL 
leaves (A2)=23g 

If (Al)=1233, the sequence: 
VL Al 
A2 VL 
leaves (A2)=23g 

The 2 6 bit in the VL register is a 1 if the 
low-order 6 bits are 0; otherwise, the 2^ bit 
is a 0. 
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INSTRUCTIONS 024 - 025 



CAL Syntax 


Description 


Octal Code 


Ai Bjk 
Bjk Ai 


Transmit (Bjk) to Ai 
Transmit (Ai) to Bjk 


024ijfc 
025ijk 



Instruction 024 enters the contents of Bjk into Ai. 
Instruction 025 enters the contents of Ai into Bjk. 



HOLD ISSUE CONDITIONS; Instruction 034 or 035 in process 

For instruction 024ij&, instruction 025ijk 
issued in previous CP 



EXECUTION TIME: 



SPECIAL CASES: 



Ai reserved 

For instruction 024, Ai ready, 1 CP 

Instruction issue, 1 CP 

None 
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INSTRUCTION 026 



CAL Syntax Description Octal Code 



Ai PSj Population count of (Sj) to hi 026ij0 

hi QSj Population count parity of (Sj) to Ai 026ijl 

Ai SBj Transfer (SBj) to Ai 026ij7 



Instruction 026ij0 counts the number of bits set to 1 in (Sj) and 
enters the result into the low-order 7 bits of Ai. The high-order 17 
bits of Ai are zeroed. If (Sj)=0, then (Ai)=0. 

Instruction 026ijl counts the number of bits set to 1 in (Sj). Then, 
the low-order bit, showing the odd/even state of the result is 
transferred to the low-order bit position of the Ai register. The 
high-order 23 bits are cleared. The actual population count is not 
transferred. 

Instructions 026ij0 and 026ijl are executed in the Population/ 
Leading Zero Count functional unit. 

Instruction 026ij7 transfers the SBj register contents shared between 
the CPUs to Ai. 



HOLD ISSUE CONDITIONS: Ai reserved 

Sj reserved (except SO) 

For instruction 026ij7, hold issue 1 CP, then 2+t 
CP more after Ai not reserved. Minimum 3 CP hold. 

EXECUTION TIME: Instruction issue, 1 CP 

For instructions 026ij0 and 026ijl, Ai 
ready 4 CPs 

For instruction 026ij7, Ai ready 1 CP 

SPECIAL CASES: For instructions 026ij0 and 026ijl, (Ai)=0 if j=0, 

For instruction 026ij7, (Ai)=0 if CLN=0 . 



*f* If more than one CPU attempts to access semaphores or shared 

registers in the same CP, a scanner resolves the conflict. Refer to 
shared register explanation in section 2. 
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INSTRUCTION 027 



CAL Syntax 


Description 


Octal Code 


hi ZSJ 
SBj Ai 


Leading zero count of (Sj) to Ai 
Transfer (Ai) to SBj 


027 ijO 
027ij7 



Instruction 027 ij'O counts the number of leading zeros in Sj and enters 
the result into the low-order 7 bits of Ai. The high-order 17 bits of 
Ai are zeroed. Instruction 027ij"0 is executed in the Population/Leading 
Zero Count functional unit. 

Instruction 027ij7 stores (Ai) to the SBj register, which is shared 
between the CPUs in the same cluster. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME; 



SPECIAL CASES: 



For instruction 027ij0, instruction 033 issued 
in CP 2 

Ai reserved 

Sj reserved (except SO) 

For instruction 027 ij'7 , hold issue 1 CP, then 
2+t CP more after Ai not reserved. Minimum 
3 CP hold. 

Instruction issue, 1 CP 

For instruction 027ij"0, Ai ready, 3 CPs 

For instruction 027ij*7, SBj ready, 1 CP 

For instruction 027ij0, (Ai)=64 if j=0. 

For instruction 027ij0, (Ai)=0 if (Sj) is 
negative. 

Instruction 027ij7 is a no-op if CLN=0. 



f If more than one CPU attempts to access semaphores or shared registers 
in the same CP, a scanner resolves the conflict. Refer to shared 
register explanation in section 2. 
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INSTRUCTIONS 030 - 031 



CAL Syntax 


Description 


Octal Code 


Ai kj+kk 


Integer sum of (Aj) and (kk) to Ai 


030 i jk 


Ai Afct 


Transmit (A*) to Ai 


030i0k 


Ai Aj+lt 


Integer sum of (Aj) and 1 to Ai 


030 ijO 


Ai Aj-A* 


Integer difference (Aj) less (kk) to Ai 


031ijfc 


Ai -it 


Transmit -1 to Ai 


031i00 


Ai -Afct 


Transmit the negative of (kk) to Ai 


031i0fc 


Ai Aj-lt 


Integer difference (Aj) less 1 to Ai 


031ij0 



f Special CAL syntax 



Instruction 030 forms the integer sum of (Aj) and (kk) and enters the 
result into Ai. No overflow is detected. 

Instruction 031 forms the integer difference of (Aj) and (kk) and 
enters the result into Ai. No overflow is detected. 

Instructions 030 and 031 are executed in the Address Add functional unit, 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SPECIAL CASES: 



Ai reserved 

Aj or kk reserved (except A0) 

Instruction issue, 1 CP 

Ai ready, 2 CPs 

For instruction 030: 

(ki)=(kk) if j=0 and k*Q. 
(Ai)=l if j=0 and *=0. 
(Ai)=(Aj) + 1 if j*0 and *=0 

For instruction 031: 

(Ai)= -(kk) if j=0 and k*0. 
(Ai)= -1 if j=0 and &=0. 
(Ai)=(Aj) - 1 if JjSO and fc=0 
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INSTRUCTION 032 



CAL Syntax 


Description 


Octal Code 


hi hj*hk 


Integer product of (Aj) and (hk) to hi 


032 ijk 



Instruction 032 forms the integer product of (Aj) and (hk) and enters 
the low-order 24 bits of the result into Ai. No overflow is detected, 

Instruction 032 is executed in the Address Multiply functional unit. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SPECIAL CASES: 



Ai reserved 

Aj or hk reserved (except A0) 

Instruction issue, 1 CP 

Ai ready, 4 CPs 

(Ai)=0 if j=0. 
(A*)=l if k=0. 
Thus, (Ai)=(Aj) if j*0 and k=0 
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INSTRUCTION 033 



CAL 


Syntax 


Description 




Octal Code 


hi 


CI 


Channel number of highest priority 
request to Ai 


interrupt 


033100 


Ai 


CA,Aj 


Current address of channel (Aj) to 


Ai 


033ij0 


hi 


CE, hj 


Error flag of channel (Aj) to Ai 




033ijl 



Instruction 033 enters channel status information into Ai. The j and 
k designators and the contents of Aj define the desired information. 

The channel number of the highest priority interrupt request is entered 
into Ai when the j designator is 0. The contents of Aj specify a 
channel number when the j designator is nonzero. The value of the 
Current Address (CA) register for the channel is entered into Ai when 
the k designator is 0. The error flag for the channel is entered into 
the low-order bit of Ai when the k designator is 1. The high-order 
bits of Ai are cleared. The error flag can be cleared only in monitor 
mode using instruction 0012. 

Instruction 033 does not interfere with channel operation and is not 
protected from user execution. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



SPECIAL CASES: 



Ai reserved 

A j reserved (except A0) 

Instruction issue, 1 CP 

Ai ready, 4 CPs 

(Ai)=Highest priority channel causing interrupt 
if (Aj)=0. 

(Ai)=Current address of channel (Aj) if 
(Aj)^0 and fc=0. 

(Ai)=I/0 error flag of channel (Aj) if 
(Aj)^0 and k=l. 



HR-0097 



5-37 



INSTRUCTION 033 (continued) 

SPECIAL CASES: 6 CPs must elapse after instruction 0012 j'O issues 

(continued) before issuing instruction 033iOO. 

Before the results of a 033ij*0 instruction to 
channels 10 through 17 or a 033ijl instruction 
to channels 6 or 7 are valid, there is a 12 -CP 
latency. Therefore, before a 033ij"X 
instruction can be issued to these channels, 12 
CPs must elapse after issuing a channel function 
or completing a channel transfer. 

If instruction 033 issues every 10 CPs (in a 
loop), the same results are always returned to 
A(i). 

When k=l and (Aj)=6 or 7: 

Bits 2^ through 2^ contain the remaining 
block length. 

Bit 2*8 indicates a request in progress. 

Bit 2^-9 returns a 0. 

Bit 2 2 ^ indicates a block length error. 

Bit 2 2 * indicates either an SSD double-bit 
memory error (during a read SSD operation) or 
an SSD double-bit channel error (during a write 
SSD operation) . 

Bit 2 22 indicates a CPU double-bit memory 
error. 

Bit 2 2 ^ indicates a fatal error (if bit 
2 20 , 2 21 , or 2 22 is set). 
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INSTRUCTIONS 034 - 037 



CAL 


Syntax 


Description 




Octal Code 


Bjk, 


hi ,A0 


Block transfer (ki) words from memory 
starting at address (A0) to B registers 
starting at register jk 




34 ij* 


Bjk, 


ki 0,A0t 


Block transfer (ki) words from memory 
starting at address (A0) to B registers 
starting at register jk 




34 ijk 


,A0 


Bjk, hi 


Block transfer (ki) words from B registers 
starting at register jk to memory starting 
at address (A0) 


035 ijk 


0,AO 


Bjk, kit 


Block transfer (ki) words from B registers 
starting at register jk to memory starting 
at address (A0) 


035ijk 


ijk, 


ki ,A0 


Block transfer (ki) words from memory 
starting at address (A0) to T registers 
starting at register jk 




036 ijk 


ijk, 


ki 0,Aot 


Block transfer (ki) words from memory 
starting at address (A0) to T registers 
starting at register jk 




36 ijk 


,A0 


1jk,ki 


Block transfer (ki) words from T registers 
starting at register jk to memory starting 
at address (AO) 


037 ijk 


0,A0 


Ijk, kit 


Block transfer (ki) words from T registers 
starting at register jk to memory starting 
at address (AO) 


037 ijk 



f Special CAL syntax 



Instructions 034 through 037 perform block transfers between memory and B 
or T registers. 

In all the instructions, the amount of data transferred is specified by 
the low-order 7 bits of (ki) . Refer to special cases for details. 

The first register involved in the transfer is specified by jk. 
Successive transfers involve successive B or T registers until B77 or T77 
is reached. Since processing of the registers is circular, BOO is 
processed after B77 and TOO is processed after T77 if the count in (ki) 
is not exhausted. 
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INSTRUCTIONS 034 - 037 (continued) 

The first memory location referenced by the transfer instruction is 
specified by (A0). The A0 register contents are not altered by execution 
of the instruction. Memory references are incremented by 1 for 
successive transfers. 

For transfers of B registers to memory, each 24-bit value is right 
adjusted in the word, high-order 40 bits are zeroed. When transferring 
from memory to B registers, only low-order 24 bits are transmitted; 
high-order 40 bits are ignored. 



HOLD ISSUE CONDITIONS: A0 reserved 

hi reserved 
Scalar reference in CP1, CP2, CP3, or CP4 

For instruction 034, Port A busy or instruction 

035 in process or unidirectional memory mode and 
Port C busy 

For instruction 035, Port C busy or instruction 
034 in process or unidirectional memory mode and 
Port A or Port B busy 

For instruction 036, Port B busy or instruction 
037 in process or unidirectional memory mode and 
Port C busy 

For instruction 037, Port C busy or instruction 

036 in process or unidirectional memory mode and 
Port A or Port B busy 



EXECUTION TIME: 



Instruction issue, 1 CP 



For instruction 034 or 036: 

B or T register reserved 16 CPs + (Ai) if 
(Ai)^0; 6 CPs if (Ai)=0. 
Port A or B busy for (Ai) + 6 CPs if 
(Ai)/0; 4 CPs if (Ai)=0. 

For instruction 035 or 037: 

B or T register reserved 5 CPs + (Ai) if 
(Ai)^0; 4 CPs if (Ai)=0. 
Port C busy for (Ai) + 6 CPs if (Ai)*0; 
4 CPs if (Ai)=0. 
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INSTRUCTIONS 034 - 037 (continued) 

SPECIAL CASES: (Ai)=0 causes a zero-block transfer. 

(Ai) in the range greater than 100g and less 
than 200g causes a wrap-around condition. 

If (Ai) is greater than 177 8 , bits 2 7 

through 2 2 ^ are truncated. The block length is 

equal to the value of 2^ through 2^. 



NOTE 

Instruction 034 uses Port A, instruction 035 uses 
Port C, instruction 036 uses Port B, and instruction 
037 uses Port C. 
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INSTRUCTIONS 040 - 041 



CAL 


Syntax 


Description 


Octal Code 


Si 
Si 


exp 
exp 


Transmit jkm to Si 

Transmit complement of jkm to Si 


040 i jkm 
04 li j ton 



The 2-parcel instructions 040 and 041 enter immediate values into an S 
register. 

Instruction 040 enters a 64-bit value composed of the 22-bit jkm field 
and 42 high-order bits of into Si. 

Instruction 041 enters a 64-bit value that is the complement of a value 
formed by the 22-bit jkm field and 42 high-order bits of into Si. 
The complement is formed by changing all 1 bits to and all bits 
to 1. Thus, for instruction 041, the high-order 42 bits of Si are set 
to l's. The instruction provides for entering a negative value into 
Si. Since the register value is the ones complement of jkm, to get 
the twos complement jkm should be to get -1, 1 to get -2, 3 to get 
-4, and so on. 



HOLD ISSUE CONDITIONS: Si reserved 



Second parcel not in a buffer 



EXECUTION TIME: 



Instruction issue: 

Both parcels in same buffer, 2 CPs 

Both parcels in different buffers, 4 CPs 

Si ready, 1 CP 



SPECIAL CASES: 



None 
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INSTRUCTIONS 042 - 043 



CAL 


Syntax 


Description 


Octal Code 


Si 


<exp 


Form exp bits of ones mask in Si from right; 
jk field gets 64-exp. 


042ijfc 


Si 


#>expt 


Form exp bits of zeros mask in Si from left; 
jk field gets exp. 


04 2 ijfc 


Si 


it 


Enter 1 into Si 


042i77 


Si 


-it 


Enter -1 into Si 


042i00 


Si 


>exp 


Form exp bits of ones mask in Si from left; 
jk field gets exp. 


043 ijk 


Si 


#<expt 


Form exp bits of zeros mask in Si from right; 
jk field gets 64-exp. 


043 ijk 


Si 


ot 


Clear Si 


043i00 



f Special CAL syntax 



Instruction 042 generates a mask of 64 - jk ones from right to left in 
Si. For example, if jk=0, Si contains all 1 bits (integer value= -1) 
and if jx=77 8 , Si contains zeros in all but the low-order bit 
(integer value=l). 

Instruction 043 generates a mask of jk ones from left to right in Si. 
For example, if jk=0, Si contains all bits (integer value=0) and if 
jfc=77g, Si contains ones in all but the low-order bit (integer value= -2) 

The Scalar Logical functional unit executes instructions 042 and 043. 



HOLD ISSUE CONDITIONS; 
EXECUTION TIME: 

SPECIAL CASES: 



Si reserved 

Instruction issue, 1 CP 
Si ready, 1 CP 
None 
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INSTRUCTIONS 044 - 051 



CAL Syntax 


Description 


Octal Code 


Si 


Sj&Sfc 


Logical product of (Sj) and (Sk) to Si 


044ijfc 


Si 


Sj&SBt 


Sign bit of (Sj) to Si 


044ij0 


Si 


SB&Sj't 


Sign bit of (Sj) to Si (j*0) 


044ij0 


Si 


#s&&sj 


Logical product of (Sj) and complement of 
(Sk) to Si 


045ij£ 


Si 


#SB&S jt 


(Sj) with sign bit cleared to Si 


045ij0 


Si 


Sj\Sk 


Logical difference of (Sj) and (Sk) to Si 


04 6 ijk 


Si 


Sj\SBt 


Toggle sign bit of (Sj), then enter into Si 


04 6 i JO 


Si 


SB\Sjt 


Toggle sign bit of (Sj), then enter into Si 
(j*0) 


04 6 i JO 


Si 


#Sj\S* 


Logical equivalence of (Sk) and (Sj) to Si 


047 ijk 


Si 


#s&t 


Transmit ones complement of (Sk) to Si 


047i0& 


Si 


#Sj\SBt 


Logical equivalence of (Sj) and sign bit 
to Si 


047 ijO 


Si 


#SB\Sjt 


Logical equivalence of (Sj) and sign bit to 
Si (j*0) 


047 i jO 


Si 


#SBt 


Enter ones complement of sign bit into Si 


047i00 


Si 


Sj!Si&S* 


Logical product of (Si) and (Sk) 
complement ORed with logical product 
of (Sj) and (Sk) to Si 


50 ijk 


Si 


Sj!Si&SBt 


Scalar merge of (Si) and sign bit of (Sj) 
to Si 


050ij0 


Si 


S j ! Sk 


Logical sum of (Sj) and (Sk) to Si 


051ijfc 


Si 


s*t 


Transmit (Sk) to Si 


051i0fc 


Si 


SJ* ! SBt 


Logical sum of (Sj) and sign bit to Si 


051ij0 



f Special CAL syntax 
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INSTRUCTIONS 044 - 051 (continued) 



CAL Syntax 


Description 


Octal Code 


Si SBISjt 
Si SBt 


Logical sum of (Sj) and sign bit to Si (j^O) 
Enter sign bit into Si 


051ij0 
051i00 



f Special CAL syntax 



NOTE 

For instructions 044 through 051, SB with no register 
designator is the sign bit, not Shared Address register. 



The Scalar Logical functional unit executes instructions 044 through 051 

Instruction 044 forms the logical product (AND) of (Sj) and (Sk) and 
enters the result into Si. Bits of Si are set to 1 when 
corresponding bits of (Sj) and (Sk) are 1, as in the following 
example: 

(SJ) =110 
(Sk) = 10 10 
(Si) =10 

(Sj) is transmitted to Si if the j and k designators have the 
same nonzero value. Si is cleared if the j designator is 0. The 
sign bit of (Sj) is transmitted to Si if the j designator is 
nonzero and the k designator is 0. 

Instruction 045 forms the logical product (AND) of (Sj) and the 
complement of (Sk) and enters the result into Si. Bits of Si are 
set to 1 when corresponding bits of (Sj) and the complement of (Sk) 
are 1, as in the following example where (Sk* ) = complement of (Sfc): 

if (Sk) =10 10 

(Sj) =110 
(Sk') = 10 1 
(Si) =0100 

Si is cleared if the j and k designators have the same value or if 
the j designator is 0. (Sj) with the sign bit cleared is transmitted 
to Si if the j designator is nonzero and the k designator is 0. 
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INSTRUCTIONS 044 - 051 (continued) 

Instruction 046 forms the logical difference (exclusive OR) of (Sj) and 
(Sk) and enters the result into Si. Bits of Si are set to 1 when 
corresponding bits of (Sj) and (Sk) are different, as in the 
following example: 

(Sj) =110 
(Sk) = 10 10 
(Si) =0110 

Si is cleared if the j and k designators have the same nonzero value. 
(Sk) is transmitted to Si if the j designator is and the k designator 
is nonzero. The sign bit of (Sj) is complemented and the result is 
transmitted to Si if the j designator is nonzero and the k designator 
is 0. 

Instruction 047 forms the logical equivalence of (Sj) and (Sk) , and 
enters the result into Si. Bits of Si are set to 1 when corresponding 
bits of (Sj) and (Sk) are the same as in the following example: 

(Sj) =110 
(S*) = 10 10 
(Si) =10 1 

Si is set to all ones if the j and k designators have the same nonzero 
value. The complement of (Sk) is transmitted to Si if the j designator 
is and the k designator is nonzero. All bits except the sign bit of 
(Sj) are complemented and the result is transmitted to Si if the j 
designator is nonzero and the & designator is 0. The result is the 
complement produced by instruction 046. 

Instruction 050 merges the contents of (Sj) with (Si) depending on the 
ones mask in Sk. The result is defined by the following Boolean equation 
where Sk' is the complement of Sk as illustrated: 

(Si) = (Sj)(S*) + <Si)(S*') 

if (Sk) =11110000 

(Sk' ) =00001111 

(Si) =11001100 

(Sj) = 10101010 

(Si) =10101100 

Instruction 050 is intended for merging portions of 64-bit words into a 
composite word. Bits of Si are cleared when the corresponding bits of 
Sk are 1 if the j designator is and the k designator is nonzero. 
The sign bit of (Sj) replaces the sign bit of Si if the j designator 
is nonzero and the k designator is 0. The sign bit of Si is cleared if 
the j and k designators are both 0. 
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INSTRUCTIONS 044 - 051 (continued) 

Instruction 051 forms the logical sum (inclusive OR) of (Sj) and (Sk) 
and enters the result into Si. Bits of Si are set when one of the 
corresponding bits of (Sj) and (Sk) is set as in the following 
example: 

(Sj) =110 
(Sk) = 10 10 
(Si) =1110 

(Sj) is transmitted to Si if the j and k designators have the same 
nonzero value. (Sk) is transmitted to Si if the j designator is and 
the k designator is nonzero. (Sj) with the sign bit set to 1 is 
transmitted to Si if the j designator is nonzero and the k designator 
is 0. A ones mask consisting of only the sign bit is entered into Si if 
the j and k designators are both 0. 



HOLD ISSUE CONDITIONS: Si reserved 

Sj or Sk reserved (except SO) 
EXECUTION TIME: Instruction issue, 1 CP 

Si ready, 1 CP 
SPECIAL CASES: (Sj)=0 if j=0. 

(Sfc)=2 63 if fc=0. 
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INSTRUCTIONS 052 - 055 



CAL Syntax 






Description 


Octal Code 


SO Si<exp 


Shift 


(Si) 


left exp=jk places to SO 


52 ijfc 


SO Si>exp 


Shift 


(Si) 


right exp=64-jfc places to SO 


053 ijk 


Si Si<exp 


Shift 


(Si) 


left exp=jk places to Si 


54 ijk 


Si Si>exp 


Shift 


(Si) 


right exp=64-jfc places to Si 


55 ijk 



The Scalar Shift functional unit executes instructions 052 through 055, 
They shift values in an S register by an amount specified by jk. All 
shifts are end off with zero fill. 

Instruction 052 shifts (Si) left jk places and enters the result into 
SO. Shift range is through 63 left. 

Instruction 053 shifts (Si) right by 64 - jk places and enters the 
result into SO. Shift range is 1 through 64 right. 

Instruction 054 shifts (Si) left jk places and enters the result into 
Si. Shift range is through 63 left. 

Instruction 055 shifts (Si) right by 64 - jk places and enters the 
result into Si. Shift range is 1 through 64 right. 



HOLD ISSUE CONDITIONS: Instruction 056, 057, 060, or 061 issued in 

previous CP 

Si reserved 

For instructions 052 and 053, SO reserved 
EXECUTION TIME: Instruction issue, 1 CP 

For instructions 052 and 053, SO ready, 2 CPs 

For instructions 054 and 055, Si ready, 2 CPs 
SPECIAL CASES: None 
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INSTRUCTIONS 056 - 057 



CAL Syntax 






Description 


Octal Code 


Si Si,Sj<Afc 


Shift 


(Si) 


and (Sj) left by (A*) places to Si 


56 ijk 


Si Si,Sj<l+ 


Shift 


(Si) 


and (Sj) left one place to Si 


056 i JO 


Si Si<Afct 


Shift 


(Si) 


left (A*) places to Si 


056i0fc 


Si Sj,Si>Ak 


Shift 
to Si 


(Sj) 


and (Si) right by (kk) places 


057ij7c 


Si Sj,Si>lt 


Shift 


(Sj) 


and (Si) right one place to Si 


057ij0 


Si Si>Afct 


Shift 


(Si) 


right (kk) places to Si 


057i0* 



f Special CAL syntax 



The Scalar Shift functional unit executes instructions 056 and 057. They 
shift 128-bit values formed by logically joining two S registers. Shift 
counts are obtained from register kk. All shift counts, (kk) , are 
considered positive and all 24 bits of (kk) are used for the shift 
count. A shift of one place occurs if the k designator is 0. If 
j=0, the shifts function as if the shifted value were 64 bits rather 
than 128 bits because the Sj value used is 0. 

The shifts are circular if the shift count does not exceed 64, and the 
i and j designators are equal and nonzero. For instructions 056 and 
057, (Sj) is unchanged, provided i£j. For shifts greater than 
64, the shift is end off with zero fill. If i=j and the shift is 
greater than 64, the shift is the same as if the respective instruction 
054 or 055 was used with a shift count of 64 or less. 

Instruction 056 performs left shifts of (Si) and (Sj) with (Si) 
initially the most significant bits of the double register. The 
high-order 64 bits of the result are transmitted to Si. Si is 
cleared if the shift count exceeds 127. Instruction 056 produces the 
same result as instruction 054 if the shift count does not exceed 63 and 
the j designator is 0. 

Instruction 057 performs right shifts of (Sj) and (Si) with (Sj) 
initially the most significant bits of the double register. The 
low-order 64 bits of the result are transmitted to Si. Si is cleared 
if the shift count exceeds 127. Instruction 057 produces the same result 
as instruction 055 if the shift count does not exceed 63 and the j 
designator is 0. 
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INSTRUCTIONS 056 - 057 (continued) 
HOLD ISSUE CONDITIONS: Si reserved 

Sj or kk reserved (except SO and/or A0) 
EXECUTION TIME: Instruction issue, 1 CP 

Si ready, 3 CPs 
SPECIAL CASES: (Sj)=0 if j=0 . 

(A&)=1 if *=0. 

Circular shift if i=j£Q and kk greater 

than or equal to and less than or equal to 64 
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INSTRUCTIONS 060 - 061 



CAL Syntax Description Octal Code 



Si Sj+Sk Integer sum of (Sj) and (Sk) to Si 060ijK 

Si Sj-Sk Integer difference of (Sj) and (S*) to Si 061ijfc 
Si -Sfct Transmit negative of (Sk) to Si 061i0fc 



f Special CAL syntax 



Instruction 060 forms the integer sums of (Sj) and (Sk) , and enters 
the result into Si. No overflow is detected. 

Instruction 061 forms the integer difference of (Sj) and (Sfc)/ and 
enters the result into Si. No overflow is detected. 

The Scalar Add functional unit executes instructions 060 and 061. 



HOLD ISSUE CONDITIONS: Si reserved 

Sj or Sk reserved (except SO) 
EXECUTION TIME: Si ready, 3 CPs 

Instruction issue, 1 CP 

SPECIAL CASES: (Si)=2 63 if j=0 and k=0. 

For instruction 060: 

(Si)=(Sk) if j=0 and fc*0. 
(Si)=(Sj) with 2 63 complemented if 
j*0 and k=0. 

For instruction 061: 

(Si)= -(Sk) if j=0 and k*0. 
(Si)=(Sj) with 2 63 complemented if 
j*0 and k=0. 



HR-0097 5-51 



INSTRUCTIONS 062 - 063 



CAL 


Syntax 


Description 


Octal Code 


Si 


Sj+FSfc 


Floating-point sum of (Sj) and (Sk) to Si 


06 2 ijfc 


Si 


+FS*t 


Normalize (Sk) to Si 


062i0fc 


Si 


Sj-FSfc 


Floating-point difference of (Sj) and (Sk) 
to Si 


06 3 ijk 


Si 


-FS*t 


Transmit normalized negative of (Sk) to Si 


063i0k 



t Special CAL syntax 



The Floating-point Add functional unit executes instructions 062 and 
063. Operands are assumed to be in floating-point format. The result is 
normalized even if the operands are not normalized. 

Instruction 062 forms the sum of the floating-point quantities in Sj 
and Sk and enters the normalized result into Si. 

Instruction 063 forms the difference of the floating-point quantities in 
Sj and Sk and enters the normalized result into Si. 

Section 4 describes overflow conditions. For floating-point operands 
with the sign bit set (bit=l), zero exponent and zero coefficient are 
treated as (that is, all 64 bits=0).it 



HOLD ISSUE CONDITIONS: Si reserved 

Sj or Sk reserved (except SO) 

Instructions 170 through 173 in process, unit 
busy (VL) + 4 CPs 

EXECUTION TIME: Instruction issue, 1 CP 

Si ready, 6 CPs 



ft Considered -0. No floating-point unit generates a -0 except the 

Floating-point Multiply functional unit if one of the operands was a 
-0. Normally, -0 occurs in logical manipulations when a sign is 
attached to a number; that number can be 0. 
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INSTRUCTIONS 062 - 063 (continued) 



SPECIAL CASES: 



For instruction 062: 

(Si)=(Sk) normalized if (Sk) exponent is 
valid, j=0 and k*Q. 

(Si)=(Sj) normalized if (Sj) exponent is 
valid, j£0 and *=0. 



For instruction 063: 

(Si)= -(Sk) normalized if (Sk) exponent is 
valid, J=0 and k*0. Sign of (Si) is 
opposite that of (Sk) if (S&)^0. 
(Si)=(Sj) normalized if (Sj) exponent is 
valid, j£Q and k=0. 
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INSTRUCTIONS 064 - 067 



CAL 


Syntax 


Description 




Octal Code 


Si 


Sj*FS* 


Floating-point product of (Sj) and (Sk) to 


Si 


64 ijk 


Si 


Sj*HSfc 


Half-precision rounded floating-point 
product of (Sj) and (Sk) to Si 




065 ijk 


Si 


Sj*RSfc 


Rounded floating-point product of (Sj) and 
(Sk) to Si 




066 ijk 


Si 


Sj*IS* 


Reciprocal iteration; 2-(Sj)*(SJt) to Si 




067 ijk 



The Floating-point Multiply functional unit executes instructions 064 
through 067. Operands are assumed to be in floating-point format. The 
result is not guaranteed to be normalized if the operands are not 
normalized. 

Instruction 064 forms the product of the floating-point quantities in 
Sj and Sk and enters the result into Si. 

Instruction 065 forms the half-precision rounded product of the 
floating-point quantities in Sj and Sk and enters the result into 
Si. The low-order 19 bits of the result are cleared. 

Instruction 066 forms the rounded product of the floating-point 
quantities in Sj and Sk and enters the result into Si. 

Instruction 067 forms two minus the product of the floating-point 
quantities in Sj and Sk and enters the result into Si. This instruction 
is used in the divide sequence as described in section 4 under 
Floating-point Arithmetic. 

In the evaluation C = 2-B*A, B must be a reciprocal of A of less than 47 
significant bits and not the exact reciprocal; otherwise, C will be in 
error. The reciprocal produced by the reciprocal approximation 
instruction meets this criterion. 
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INSTRUCTIONS 064 - 067 (continued) 
HOLD ISSUE CONDITIONS: Si reserved 

Sj or Sk reserved (except SO) 



Instructions 160 through 167 in process, unit 
busy (VL) + 4 CPs 

Instructions 140 through 145 in process, Second 
Vector Logical unit busy (VL) + 4 CPs 



EXECUTION TIME: Instruction issue, 1 CP 

Si ready, 7 CPs 

SPECIAL CASES: (Sj)=0 if j=0. 

(Sfc)=2 63 if k=0. 



If both exponent fields are 0, an integer 
multiply is performed. Correct integer multiply 
results are produced if the following conditions 
are met: 

• Both operand sign bits are 

• The sum of the bits to the right of the 
least significant 1 bit in the two operands 
is greater than or equal to 48 

The integer result obtained is the high-order 48 
bits of the 96-bit product of the two operands. 
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INSTRUCTION 070 



CAL Syntax 



Description 



Octal Code 



Si /HSJ Floating-point reciprocal approximation of 
(Sj) to Si 



070ij0 



The Reciprocal Approximation functional unit executes instruction 070. 

Instruction 070 forms an approximation to the reciprocal of the 
normalized floating-point quantity in Sj and enters the result into 
Si. This instruction occurs in the divide sequence to compute the 
quotient of two floating-point quantities as described in section 4 under 
Floating-point Arithmetic. 

The reciprocal approximation instruction produces a result of 30 
significant bits. The low-order 18 bits are zeros. The number of 
significant bits can be extended to 48 using the reciprocal iteration 
instruction and a multiply. 



HOLD ISSUE CONDITIONS: Si reserved 

Sj reserved (except SO) 

Instruction 174 in process, unit busy (VL) + 4 CPs 
EXECUTION TIME: Si ready, 14 CPs 

Instruction issue, 1 CP 



SPECIAL CASES: 



(Si) is meaningless if (Sj) is not 
normalized; the unit assumes that bit 2^' of 
(Sj)=l; no test is made of this bit. 

(Sj)=0 produces a range error; the result is 
meaningless. 

(Sj)=0 if j=0. 
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INSTRUCTION 071 



CAL 


Syntax 




Description 


Octal Code 


Si 


A* 


Transmit 


(kk) to Si with 


no sign extension 


071i0* 


Si 


+A* 


Transmit 


(kk) to Si with 


sign extension 


071ilfc 


Si 


+Fkk 


Transmit 


(kk) to Si as unnormalized 


071i2* 






f loating- 


-point number 






Si 


0.6 


Transmit 


constant 0.75 x 


2 48 to Si 


071i30 


Si 


0.4 


Transmit 


constant 0.5 to 


Si 


071i40 


Si 


1. 


Transmit 


constant 1.0 to 


Si 


071i50 


Si 


2. 


Transmit 


constant 2.0 to 


Si 


071i60 


Si 


4. 


Transmit 


constant 4.0 to 


Si 


071i70 



Instruction 071 performs functions that depend on the value of the j 
designator. The functions are concerned with transmitting information 
from an A register to an S register and with generating frequently used 
floating-point constants. 

When the j designator is 0, the 24-bit value in kk is transmitted to 
Si. The value is treated as an unsigned integer. The high-order bits 
of Si are zeros. 

When the j designator is 1, the 24-bit value in kk is transmitted to 
Si. The value is treated as a signed integer. The sign bit of kk is 
extended through the high-order bit of Si. 

When the J designator is 2, the 24-bit value in kk is transmitted to 
Si as an unnormalized floating-point quantity (the result is then added 
to to normalize). For this instruction, the exponent in bits 
2^2 through 2 48 is set to 4OO6O3. The sign of the coefficient is 
set according to the sign of kk. If the sign bit of A* is set, the 
twos complement of kk is entered into Si as the magnitude of the 
coefficient and bit 2^3 Q f Si is set for the sign of the coefficient. 

A sequence of instructions is used to convert an integer whose absolute 
value is less than 24 bits to floating-point format: 



CAL code: 


Al 


SI 




SI 


+FA1 




SI 


+FS1 



9 CPs required 
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INSTRUCTION 071 (continued) 

When the j designator is 3, the floating-point constant of 0.75 x 2 4 ^ 
is entered into Si (0 40060 6000 0000 0000 0000 8 ). This constant is 
used to create floating-point numbers from integer numbers (positive and 
negative) whose absolute value is less than 47 bits. A sequence of 
instructions is used for conversion of an integer in SI: 

CAL code: S2 0.6 

SI S2-S1 

SI S2-FS1 11 CPs required 

When the J designator is 4, the floating-point constant 0.5 
(= 40000 4000 0000 0000 OOOO3) is entered into Si. 

When the J designator is 5, the floating-point constant 1.0 
(= 40001 4000 0000 0000 0000 8 ) is entered into Si. 

When the j designator is 6, the floating-point constant 2.0 
(= 40002 4000 0000 0000 0000 8 ) is entered into Si. 

When the j designator is 7, the floating-point constant 4.0 
(= 40003 4000 0000 0000 0000 8 ) is entered into Si. 



HOLD ISSUE CONDITIONS: Si reserved 



EXECUTION TIME: 



SPECIAL CASES: 



Ak reserved (except A0); applies to all forms 
of the instruction, that is, j designators 
through 7 . 

Instruction issue, 1 CP 

Si ready, 2 CPs 

(A£)=l if *=0. 

(Si)=(A7c) if j=0. 

(Si)=(A*) sign extended if j=l. 

(Si)=(hk) unnormalized if J=2. 

(Si)=0.6 x 2 60 (octal) if j=3. 

(Si)=0.4 x 2° (octal) if j=4. 

(Si)=0.4 x 2 1 (octal) if J=5. 

(Si)=0.4 x 2 2 (octal) if j=6. 

(Si)=0.4 x 2 3 (octal) if j=7. 
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INSTRUCTIONS 072 - 075 



CAL 


Syntax 


Description 


Octal Code 


Si 


RT 


Transmit (RTC) to Si 


072i00 


Si 


SM 


Read semaphores to Si 


072i02 


Si 


STj 


Read (STj) register to Si 


072ij3 


Si 


VM 


Transmit (VM) to Si 


073i00 


t 




Read performance counter into Si 


073ill 


t 




Increment performance counter 


073i21 


t 




Clear all maintenance modes 


073i31 


Si 


SRO 


Transmit (SRO) to Si 


073i01 


SM 


Si 


Load semaphores from Si 


073i02 


STj 


Si 


Load (STj) register from Si 


073ij3 


Si 


Tjk 


Transmit (Tjk) to Si 


074ij* 


Tjk 


Si 


Transmit (Si) to Hjk 


075ij* 



f Not currently supported 



Instruction 072i00 enters the 64-bit value of the real-time clock (RTC) 
into Si. The clock is incremented by 1 each CP. The RTC can be set 
only by the monitor through use of instruction 0014 JO. 

Instruction 072i02 enters the values of all of the semaphores into 
Si. The 32-bit SM register is left- justified in Si with SM00 
occupying the sign bit. 

Instruction 072ij3 enters the contents of STj into Si. 

Instruction 073i00 enters the 64-bit value of the VM register into 

Si. The VM register is usually read after being set by instruction 175. 

Instruction 073ill is used for performance monitoring and is privileged 
to monitor mode. Each execution of the 07 3 ill instruction advances a 
pointer and enters either the high-order or low-order bits of a 
performance counter into the high-order bits of Si. Refer to appendix 
C for information on performance monitoring. 
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INSTRUCTIONS 072 - 07 5 (continued) 

Instruction 073i31 is part of the SECDED maintenance mode functions and 
is executed only if the maintenance mode switch on the mainframe's 
control panel is on. Instruction 073i31 clears all three SECDED 
maintenance mode instructions: 001501, 001521/ and 001531. Refer to 
appendix D for complete information on the SECDED maintenance modes . 

Instruction 073i01 sets the low-order 32 bits to l's and returns the 
following status to the high-order bits of Si: 

Si Bit Description 

2 63 Clustered, CLN £ (CL) 

2 57 Program state (PS) 

251 Floating-point error occurred (FPS) 

2 50 Floating-point interrupt enabled (IFP) 

2 4 ^ Operand range interrupt enabled (IOR) 

2 4 ** Bidirectional memory enabled (BDM) 

2 4 ^t Processor number bit 1 (PN1) TT 

2 40 t Processor number bit (PNO)tt 

2 34 t Cluster number bit 2 (CLN2) 

2 33 t Cluster number bit 1 (CLN1) 

2 32 t Cluster number bit (CLN0) 

Instruction 073i02 sets the semaphores from 32 high-order bits of 
Si. SM00 receives the sign bit of Si. 

Instruction 073ij(3 enters the contents of Si into STj. 

Instruction 074 enters the contents of Tjk into Si. 

Instruction 075 enters the contents of Si into Tjk. 



HOLD ISSUE CONDITIONS: Si reserved 

For instructions 074 and 075, instructions 036 
through 037 in process 

For instruction 074, instruction 075 issued in 
the previous CP 



f These bit positions return a value of if not executed in monitor 

mode, 
ft The maintenance panel switch setting determines the processor number 
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INSTRUCTIONS 072 - 075 (continued) 



HOLD ISSUE CONDITIONS: 
(continued) 



For instruction 073i00: 

Instruction 14x or 175 in process, VM busy 

for (VL) + 5 CPs 

Instruction 003 in process, VM busy for 1 CP 

For instructions 072ij"3, 073ij"3, and 
073102, hold issue 1 CP, then 2+t CP more 
after Si not reserved. Minimum 3-CP hold. 



EXECUTION TIME: 



Instruction issue, 1 CP 



All cases except 073ij"3, result register ready , 
1 CP 

For 073102, SM ready, 1 CP 



SPECIAL CASES: 



For instructions 072i02 and 072ij3, (Si)=0 
if CLN=0. 



Instructions 073i02 and 073ij3 are no-ops if 
CLN=0. 



f If more than one CPU attempts to access semaphores or shared registers 
in the same CP, a scanner resolves the conflict. Refer to shared 
register explanation in section 2. 
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INSTRUCTIONS 076 - 077 



CAL Syntax 



Description 



Octal Code 



Si Vj,hk Transmit (Vj element (hk)) to Si 
Vi,Ak Sj Transmit (Sj) to Vi element (A&) 
Vi,A& ot Clear Vi element (Ak) 



07 6 ijfc 

onijk 

077i0k 



t Special CAL syntax 



Instructions 076 and 077 transmit a 64-bit quantity between a V register 
element and an S register. 

Instruction 076 transmits the contents of an element of register Vj to 
Si. 

Instruction 077 transmits the contents of register Sj to an element of 
register Vi. 

The low-order 6 bits of (kk) determine the vector element for either 
instruction. 



HOLD ISSUE CONDITIONS: A* reserved (except A0) 



EXECUTION TIME: 



SPECIAL CASES: 



For instruction 076, Si reserved or Vj 
reserved as operand or as result 

For instruction 077, Vi reserved as operand or 
as result or Sj reserved 

Instruction issue, 1 CP 

For instruction 076, Si ready, 4 CPs 

For instruction 077, Vi ready, 1 CP 

(Sj)=0 if j=0. 

(A&)=1 if k=Q. 
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INSTRUCTIONS 10ft - 13ft 



CAL Syntax 


Description 


Octal Code 


ki exp,kh 


Read from ((Aft) + jkm) to Ai 


lOhijkm 


ki exp,ot 


Read from (jkm) to Ai 


100 i jkm 


ki exp A 


Read from (jkm) to Ai 


100 i jkm 


ki , Aftt 


Read from (Aft) to Ai 


lOftiOOO 


exp,kh ki 


Store (Ai) to (Aft) + jkm 


llftijfon 


exp,0 Ait 


Store (Ai) to jkm 


llOijfon 


exp, Ait 


Store (Ai) to exp 


110 ij£m 


,kh Ait 


Store (Ai) to (Aft) 


llftiOOO 


Si exp, Ah 


Read from ((Aft) + jkm) to Si 


llhijkm 


Si exp,ot 


Read from (exp) to Si 


120 i jkm 


Si exp,t 


Read from (exp) to Si 


120 i jkm 


Si ,Aftt 


Read from (Aft) to Si 


12fti000 


exp,kh Si 


Store (Si) to (Aft) + jkm 


llhijkm 


exp,0 Sit 


Store (Si) to exp 


130 i jkm 


exp, Sit 


Store (Si) to exp 


130 i jkm 


,kh Sit 


Store (Si) to (Aft) 


13fti00 



f Special CAL syntax 



The 2-parcel instructions 10ft through 13ft transmit data between 
memory and an A register or an S register. 

If the enhanced addressing mode bit in the Exchange Package is not set, 
the contents of Aft (treated as a 2 2 -bit signed integer) are added to 
the signed 22-bit integer in the jkm field to determine the memory 
address. Database address bits 2^2 an o[ 2^3 determine which 4 million 
words of memory are used. If the enhanced addressing mode (EAM) bit of 
the Exchange Package is set, the contents of Aft (treated as a 24-bit 
integer) are added to the sign extended 24-bit integer in the jkm field 
to determine the memory address. 
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INSTRUCTIONS 10ft - 13ft (continued) 

If ft is 0, (Aft) is and only the jkm field is used for the address. 
The address arithmetic is performed by an address adder similar to but 
separate from the Address Add functional unit. 

Instructions 10ft and lift transmit 24-bit quantities to or from A 
registers. When transmitting data from memory to an A register, the 
high-order 40 bits of the memory word are ignored. On a store from Ai 
into memory, the high-order 40 bits of the memory word are zeroed. 

Instructions 12ft and 13ft transmit 64-bit quantities to or from 
register Si. 



HOLD ISSUE CONDITIONS: 



EXECUTION TIME: 



Port A, B, or C busy 

Aft reserved or busy previous CP 

For instructions 10ft and lift, Ai reserved 

For instructions 12ft and 13ft, Si reserved 

Instructions lOx through 13x in CP 2 and CP 
3 and conflict 

Second parcel not in a buffer 

Second parcel in different buffer, 2 CP 

Instruction issue: 

Both parcels in same buffer, 2 CPs 

For instruction 10ft, Ai ready, 14 CPs 

For instruction 12ft, Si ready, 14 CPs 

Bank ready for next scalar read or store, 4 CPs 



NOTE 

After issuing instructions 10ft through 
13ft, attempting to issue instructions 
034 through 037, 176, or 177 causes 
Ports A, B, or C to be considered busy 
for 4 CPs (plus additional CPs if there 
are conflicts) . 



SPECIAL CASES: 



If the EAM of the Exchange Package is set, the 
jkm field is sign-extended to 24 bits. 
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INSTRUCTIONS 140 - 147 



CAL Syntax 


Description 


Octal Code 


vi 


sj&vfc 


Logical products of (Sj) and (Vk elements) 
to Vi elements 


140ijfc 


Vi 


vj&vk 


Logical products of (Vj elements) and 
(Vk elements) to Vi elements 


141ijfc 


Vi 


S j ! Vfc 


Logical sums of (Sj) and (Vk elements) 
to Vi elements 


142ijfc 


vi 


vfct 


Transmit (Vk elements) to Vi elements 


142i0fc 


vi 


vjivk 


Logical sums of (Vj elements) and 
(Vk elements) to Vi elements 


143ijfc 


vi 


sj vk 


Logical differences of (Sj) and (Vk elements) 
to Vi elements 


144ijfc 


vi 


Vj V* 


Logical differences of (Vj elements) and 
(Vk elements) to Vi elements 


145ijfc 


vi 


ot 


Clear Vi elements 


145iii 


vi 


S j ! Vfc&VM 


If VM bit=l/ transmit (Sj) to the 
corresponding element in Vi. If VM bit=0, 
transmit the (corresponding Vk element) to the 
(corresponding Vi element). 


14 6 ijk 


vi 


#VM&Vfct 


If VM bit=l, transmit (0) to the corresponding 
element in Vilf VM bit=0, transmit the 
(corresponding Vk element) to the 
(corresponding Vi element) 


146i0* 


vi 


V j ! V&&VM 


If VM bit=l, transmit the (corresponding Vj 
element) to the (corresponding Vi element). 
If VM bit=0, transmit the (corresponding Vk 
element) to the (corresponding Vi element). 


111 ijk 



f Special CAL syntax 



Instructions 140 through 145 can be executed in either the Full Vector 
Logical or the Second Vector Logical functional units, provided the 
Second Vector Logical Unit is enabled. If the Second Vector Logical unit 
is disabled, instructions 140 through 145 can be executed only in the 
Full Vector Logical unit. Instructions 146 and 147 execute in the 
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INSTRUCTIONS 140 - 147 (continued) 

Full Vector Logical unit only. The number of operations performed is 
determined by the contents of the VL register. All operations start with 
element of the Vi, Vj, or Vk register and increment the element number 
by 1 for each operation performed. All results are delivered to Vi. 

For instructions 140, 142, 144, and 146, a copy of the content of Sj is 
delivered to the functional unit. The copy of the content is held as one 
of the operands until completion of the operation. Therefore, Sj can 
be changed immediately without affecting the vector operation. For 
instructions 141, 143, 145, and 147, all operands are obtained from V 
registers. 

Instructions 140 and 141 form the logical products (AND) of operand pairs 
and enter the result into Vi. Bits of an element of Vi are set to 1 
when the corresponding bits of (Sj) or (Vj element) and (Vk element) 
are 1, as in the following: 

(Sj) or (Vj element) =110 
(V* element) = 10 10 
(Vi element) =10 

Instructions 142 and 143 form the logical sums (inclusive OR) of operand 
pairs and deliver the results to Vi. Bits of an element of Vi are set 
to 1 when one of the corresponding bits of (Sj) or (Vj element) and 
(Vk element) is 1, as in the following: 

(Sj) or (Vj element) =110 
(V* element) = 10 10 
(Vi element) =1110 

Instructions 144 and 145 form the logical differences (exclusive OR) of 
operand pairs and deliver the results of Vi . Bits of an element are set 
to 1 when the corresponding bit of (Sj) or (Vj element) is different 
from (Vk element), as in the following: 

(Sj) or (Vj element) =110 
(V* element) = 10 10 
(Vi element) =0110 

Instructions 146 and 147 transmit operands to Vi depending on the VM 
register contents. Bit 2^3 of the mask corresponds to element of a V 
register. Bit 2^ corresponds to element 63. Operand pairs used for the 
selection depend on the instruction. For instruction 146, the first 
operand is always (Sj), the second operand is (Vk element). For 
instruction 147, the first operand is (Vj element) and the second 
operand is (Vk element). If bit n of the vector mask is 1, the first 
operand is transmitted; if bit n of the mask is 0, the second operand, 
(Vk element), is selected. 
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INSTRUCTIONS 140 - 147 (continued) 

Example 1: 

If instruction 146 is to be executed and the following register 
conditions exist: 

(VL) = 4 

(VM) = 60000 0000 0000 0000 0000 

(S2) = -1 

(V600) = 1 

(V601) = 2 

(V602) = 3 

(V603) = 4 

Instruction 146726 is executed. Following execution, the first four 
elements of V7 contain the following values: 

(V700) = 1 

(V701) = -1 

(V702) = -1 

(V703) = 4 

The remaining elements of V7 are unaltered. 

Example 2: 

If instruction 147 is to be executed and the following register 
conditions exist: 



(VL) = 4 




(VM) = 


600000 0000 0000 0000 0000 


(V200) = 1 


(V300) = -1 


(V201) = 2 


(V301) = -2 


(V202) = 3 


(V302) = -3 


(V203) = 4 


(V303) = -4 



Instruction 147123 is executed. Following execution, the first four 
elements of VI contain the following values: 

(V100) = -1 

(V101) = 2 

(V102) = 3 

(V103) = -4 

The remaining elements of VI are unaltered. 
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INSTRUCTIONS 140 - 147 (continued) 

HOLD ISSUE CONDITIONS: V* reserved as operand 

Vi reserved as operand or result 

For instructions 140, 142, 144, and 146, Sj 
reserved 

For instructions 141, 143, 145, and 147, Vj 
reserved as operand 

For instructions 146 and 147, or instructions 140 
through 145 with Second Vector Logical unit 
disabled: 

Instruction 14x or 175 in process, Full 
Vector Logical unit busy (VL) + 4 CPs 

For instructions 140 through 145 with Second 
Vector Logical unit enabled: 

Refer to discussion of Second Vector Logical 

issue in section 4. 

Instruction 140 through 145 or 16x in progress 
in Second Vector Logical/Floating-point Multiply 
unit, Second Vector Logical unit busy (VL) + 4 CPs 

Instruction 140 through 147 or 175 in progress in 
Full Vector Logical unit, Full Vector Logical 
unit busy (VL) + 4 CPs 

Instruction issue, 1 CP 

Vj or Vk ready in (VL) + 3 CPs if data 
available* 

Vi ready in (VL) + 7 CPs if data availablet 
for the Full Vector Logical unit; 9 CPs if 
available for the Second Vector Logical unit. 

Unit ready, (VL) + 4 CPs if data availablet 

(Sj)=0 if j=0. 



EXECUTION TIME: 



SPECIAL CASES: 



Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 150 - 151 



CAL 


Syntax 


Description 


Octal Code 


vi 


Vj<kk 


Shift (Vj) elements 
to Vi elements 


left by (A*) places 


1 50 ijk 


vi 


Vjclt 


Shift (Vj) elements 
Vi elements 


left one place to 


150ij0 


vi 


Vj>hk 


Shift (Vj) elements 
to Vi elements 


right by (kk) places 


151 ijfc 


vi 


Vj>lt 


Shift (Vj) elements 
Vi elements 


right one place to 


151ij0 



f Special CAL syntax 



Instructions 150 and 151 are executed in the Vector Shift functional 
unit. The number of operations performed is determined by the VL 
register contents. Operations start with element of the Vi and Vj 
registers and end with elements specified by (VL) - 1. 

All shifts are end off with zero fill. The shift count is obtained from 
(kk) and all 24 bits of A* are used for the shift count. Elements of 
Vi are cleared if the shift count exceeds 63. All shift counts (kk) 
are considered positive. 

Unlike shift instructions 052 through 055, these instructions receive the 
shift count from kk, rather than the jk fields. 



HOLD ISSUE CONDITIONS: 



Vj reserved as operand 

Vi reserved as operand or result 

kk reserved (except A0) 

Instructions 150 through 153 in process, unit 
busy (VL) + 4 CPstt 



•f"f* Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 150 - 151 (continued) 
EXECUTION TIME: Vj ready in (VL) + 3 CPs if data availablet 

Vi ready in (VL) + 8 CPs if data availablet 
Unit ready, (VL) + 4 CPs if data availablet 
SPECIAL CASES: (A*)=l if *=0 . 



Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 152 - 153 



CAL Syntax 


Description 




Octal Code 


Vi Vj,Vj<Afc 


Double shifts of (Vj elements) 
places to Vi elements 


left (A*) 


152 i jk 


vi vj,vj< it 


Double shifts of (Vj elements) 
place to Vi elements 


left one 


152ij0 


Vi Vj,VJ>A& 


Double shifts of (Vj elements) 
places to Vi elements 


right (kk) 


153ij* 


vi vj,vj> it 


Double shifts of (Vj elements) 
place to Vi elements 


right one 


153ij0 



t Special CAL syntax 

The Vector Shift functional unit executes instructions 152 and 153. The 
instructions shift 128-bit values formed by logically joining the 
contents of two elements of the Vj register. The direction of the 
shift determines whether the high-order bits or the low-order bits of the 
result are sent to Vi. Shift counts are obtained from register kk. 

All shifts are end off with zero fill. 

The number of operations is determined by the VL register contents. 

Instruction 152 performs left shifts. The operation starts with element 
of Vj. If (VL) is 1, element is joined with 64 bits of 0, and the 
resulting 128-bit quantity is then shifted left by the amount specified 
by (kk) . Only the one operation is performed. The 64 high-order bits 
remaining are transmitted to element of Vi. 

If (VL) is 2, the operation starts with element of Vj being joined 
with element 1, and the resulting 128-bit quantity is then shifted left 
by the amount specified by (kk) . The high-order 64 bits remaining are 
transmitted to element of Vi. Figure 5-7 shows this operation. 



HR-0097 5-71 



,63 



2 2 63 



(element 0) of Vj 



(element 1) of Vj 



,63 



2 63-(A*) 2° 2 63 



2 64-(A*) 2° 



(element 0) of ||| 



(element 1) of Vj 



(A*) 



,63 



>0 



64-Bit Result to Element of Vi 



Figure 5-7. Vector Left Double Shift, First Element, 
VL Greater than 1 



If (VL) is greater than 2, the operation continues by joining element 1 
with element 2 and transmitting the 64-bit result to element 1 of Vi. 
Figure 5-8 shows this operation. 



2 63 




20 


2^3 


20 


(element 1) of Vj 


(element 2) of Vj 





,63 



2 63-(Afc) 2° 2 63 



2 64-(A*) 2° 



(element 1) of Ijjfj 



(element 2) of vj 



■(A*) 



,63 



>0 



64-Bit Result to Element 1 of Vi 



Figure 5-8. Vector Left Double Shift, Second Element, 
VL Greater than 2 



If (VL) is 2, element 1 is joined with 64 bits of and only two 
operations are performed. In general, the last element of Vj as 
determined by (VL) is joined with 64 bits of zeros. Figure 5-9 shows 
this operation. 
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INSTRUCTIONS 152 - 153 (continued) 



2 63 


20 


2 63 






2° 


(element (VL)-lt) of Vj 




000 








2 63-~ 2 63-(A*) 2 
(element (VL)-lf) of Vj 


63-^ 64-(A*) 
2 2 2 


^^ — 

-* 



(A*) 



,63 



64-Bit Result to Element (VL)-lt of Vj 



Figure 5-9. Vector Left Double Shift, Last Element 



If (kk) is greater than or equal to 128, the result is all zeros. If 
(kk) is greater than 64, the result register contains at least (kk) - 64 
zeros. 



Example 1: 

If instruction 152 is to be executed and the following register 
conditions exist: 

(VL) = 4 

(Al) = 3 

(V400) = 00000 0000 0000 0000 0007 

(V401) = 60000 0000 0000 0000 0005 

(V402) = 1 00000 0000 0000 0000 0006 

(V403) = 1 60000 0000 0000 0000 0007 

Instruction 152541 is executed. Following execution, the first four 
elements of V5 contain the following values: 

(V500) = 00000 0000 0000 0000 0073 

(V501) = 00000 0000 0000 0000 0054 

(V502) = 00000 0000 0000 0000 0067 

(V503) = 00000 0000 0000 0000 0070 

Instruction 153 performs right shifts. The original element of 
Vj is joined with 64 high-order bits of and the 128-bit quantity 
is shifted right by the amount specified by (kk) . The 64 low-order 
bits of the result are transmitted to element of Vi. Figure 5-10 
shows this operation. 



Elements are numbered through 63 in the V registers; therefore, 
element (VL)-l refers to the VL*-* 1 element. 
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63 



INSTRUCTIONS 152 - 153 (continued) 
2 2 63 



000 



^ 



(element 0) of Vj 



X" 



\ 



\ 



\ 



\ 



^ „63 
\2 



„ (A*) -1^,0,63 
2 ^2 2 



(A*)- 



000 



\ 

2 (WJ0 



(element 0) 



:ll 



.63 



64-bit Result to 
Element of Vi 



Figure 5-10. Vector Right Double Shift, First Element 



If (VL)=1, only one operation is performed. In general, however, 
instruction execution continues by joining element with element 1, 
shifting the 128-bit quantity by the amount specified by (Ak) , and 
transmitting the result to element 1 of Vi . Figure 5-11 shows this 
operation. 



2 63 




20 


2 63 




20 


X 


(element 0) of Vj 




v. 


(element 1) of Vj 





\ 



\ 



\ 



\ „63 



Ahk)-1 \ n „63 
2 2^ 2 



\ 
\ 
\ 

(A*) NO 



(A*)- 



(element 0) of i|§i 



(element 1) of Vj 



.63 



.0 



64-(A&) bits 



64-bit Result to 
Element 1 of Vi 



Figure 5-11. Vector Right Double Shift, Second Element, 
VL Greater than 1 



The last operation performed by the instruction joins the last element of 
Vj as determined by (VL) with the preceding element. Figure 5-12 shows 
this operation. 
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INSTRUCTIONS 152 - 153 (continued) 

2 2 63 2° 



(element (VL)-2) of Vj 



(element (VL)-1+) of Vj 



.63 



2 <A*)-l-^ 2 2 63 



2< A *> ^2° 



(kk) ► 



(element (VL)-2) of Vj 



l&UQW&t-: <vi*)^lt) of vj 



,63 



^i 



64-bit Result to 
Element (VL)-l of Vj 



Figure 5-12. Vector Right Double Shift, Last Operation 



Example 2 : 

If an instruction 153 is to be executed and the following register 
conditions exist: 

(VL) = 4 

(A6) = 3 

(V200) = 00000 0000 0000 0000 0017 

(V201) = 60000 0000 0000 0000 0006 

(V202) = 1 00000 0000 0000 0000 0006 

(V203) = 1 60000 0000 0000 0000 0007 

Instruction 153026 is executed and following execution, register V0 
contains the following values: 

(V000) = 00000 0000 0000 0000 0001 

(V001) = 1 66000 0000 0000 0000 0000 

(V002) = 1 50000 0000 0000 0000 0000 

(V003) = 1 56000 0000 0000 0000 0000 

The remaining elements of V0 are unaltered. 



f Elements are numbered through 63 in the V registers; therefore, 
element (VL)-l refers to the VL th element. 
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INSTRUCTIONS 152 - 153 (continued) 

HOLD ISSUE CONDITIONS: Vj reserved as operand 

Vi reserved as operand or result 

A/c reserved (except AO) 

Instructions 150 through 153 in process, unit 
busy (VL) + 4 CPst 



EXECUTION TIME: 



SPECIAL CASES: 



Instruction issue, 1 CP 

Vj ready in (VL) + 3 CPs if data available* 

For instruction 152, Vi ready in (VL) + 9 CPs 
if data available* 

Instruction 153, Vi ready in (VL) + 8 CPs if 
data available* 

Unit ready, (VL) + 4 CPs if data availablet 

(Ak)=l if k=0. 



f Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 154 - 157 



CAL 


Syntax 


Description 


Octal Code 


vi 


Sj+Vfc 


Integer sums of (Sj) and (Vfc elements) to 
Vi elements 


1 54 ijk 


vi 


Vj+V* 


Integer sums of (Vj elements) and (Vk elements) 
to Vi elements 


155i jk 


vi 


sj-vfc 


Integer differences of (Sj) and (Vk elements) 
to Vi elements 


156ijk 


vi 


-V/ct 


Transmit negative of (V* elements) to Vi 
elements 


156i0k 


vi 


Vj-Vk 


Integer differences of (Vj elements) and 
(Vk elements) to Vi elements 


157 ijk 



The Vector Add functional unit executes instructions 154 through 157. 

Instructions 154 and 155 perform integer addition. Instructions 156 and 
157 perform integer subtraction. The number of additions or subtractions 
performed is determined by the VL register contents. All operations 
start with element of the V registers and increment the element number 
by 1 for each operation performed. All results are delivered to elements 
of Vi. No overflow is detected. 

Instructions 154 and 156 deliver a copy of (Sj) to the functional unit 
where the copy is retained as one of the operands until the vector 
operation completes. The other operand is an element of Vk. For 
instructions 155 and 157/ both operands are obtained from V registers. 



HOLD ISSUE CONDITIONS 



Vk reserved as operand 

Vi reserved as operand or result 



Instructions 154 through 157 in process, unit 
busy (VL) + 4 CPst 

For instructions 154 and 156, Sj reserved 
(except SO) 

For instructions 155 and 157, Vj reserved as 
operand 



f Special CAL syntax 
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EXECUTION TIME: 



SPECIAL CASES: 



INSTRUCTIONS 154 - 157 (continued) 

Instruction issue, 1 CP 

Vj or V* ready in (VL) + 3 CPs if data 
available* 

Vi ready in (VL) + 8 CPs if data available* 

Unit ready, (VL) + 4 CPs if data available* 

For instruction 154, if J=0, then (Sj)=0 and (Vi 
element) = (Vk element) . 

For instruction 156, if j'=0, then (Sj)=0 and (Vi 
element)= -(V& element). 



t Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 160 - 167 



CAL 


Syntax 


Description 


Octal Code 


vi 


Sj*FVk 


Floating-point products of (Sj) and 
(Vfc elements) to Vi elements 


160 ij* 


vi 


Vj*FVfc 


Floating-point products of (Vj elements) 
and (V& elements) to Vi elements 


161ijfc 


vi 


Sj*HV/c 


Half-precision rounded floating-point products 
of (Sj) and (Vk elements) to Vi elements 


16 2 ijk 


vi 


Vj*HVfc 


Half-precision rounded floating-point products 
of (Vj elements) and (V& elements) to 
Vi elements 


16 3 ijk 


vi 


Sj*RVfc 


Rounded floating-point products of (Sj) and 
(V* elements) to Vi elements 


164ij& 


vi 


Vj*RVk 


Rounded floating-point products of 

(Vj elements) and (Vfc elements) to Vi elements 


165ijfc 


vi 


Sj*IVk 


Reciprocal iterations; 2-(Sj)*(V& elements) 
to Vi elements 


16 6 ijfc 


Vi 


Vj*IVk 


Reciprocal iterations; 2-(Vj elements)* 
(V& elements) to Vi elements. 


167ij& 



The Floating-point Multiply functional unit executes instructions 160 
through 167. The number of operations performed by an instruction is 
determined by the VL register contents. All operations start with 
element of the V registers and increment the element number by 1 for 
each successive operation. 

Operands are assumed to be in floating-point format. Instructions 160, 
162, 164, and 166 deliver a copy of (Sj) to the functional unit where 
the copy is retained as one of the operands until the completion of the 
operation. Therefore, Sj can be changed immediately without affecting 
the vector operation. The other operand is an element of Vk. For 
instructions 161, 163, 165, and 167, both operands are obtained from V 
registers. 

All results are delivered to elements of Vi. If either operand is not 
normalized, there is no guarantee that the products are normalized. If 
neither operand is normalized, the product is not normalized. 

Section 4 describes out-of-range conditions. 
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INSTRUCTIONS 160 - 167 (continued) 

Instruction 160 forms the products of the floating-point quantity in 
Sj and the floating-point quantities in elements of Vk and enters 
the results into Vi. 

Instruction 161 forms the products of the floating-point quantities in 
elements of Vj and Vk and enters the results into Vi. 

Instruction 162 forms the half-precision rounded products of the 
floating-point quantity in Sj and the floating-point quantities in 
elements of Vk and enters the results into Vi. The low-order 19 
bits of the result elements are zeroed. 

Instruction 163 forms the half-precision rounded products of the 
floating-point quantities in elements of Vj and Vk and enters the 
results into Vi. The low-order 19 bits of the result elements are 
zeroed. 

Instruction 164 forms the rounded products of the floating-point 
quantity in Sj and the floating-point quantities in elements of Vk 
and enters the results into Vi. 

Instruction 165 forms the rounded products of the floating-point 
quantities in elements of Vj and Vk and enters the results into Vi. 

Instruction 166 forms for each element, two minus the product of the 
floating-point quantity in Sj and the floating-point quantity in 
elements of Vk. It then enters the results into Vi. Refer to the 
description of instruction 067 for more details. 

Instruction 167 forms for each element pair, two minus the product of 
the floating-point quantities in elements of Vj and Vk and enters 
the results into Vi . Refer to the description of instruction 067 for 
more details. 



HOLD ISSUE CONDITIONS: Vk reserved as operand 

Vi reserved as operand or result 

Instruction 16x in process, unit busy 
(VL) + 4 CPst 



f Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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HOLD CONDITIONS: 
(continued) 



EXECUTION TIME: 



SPECIAL CASES: 



INSTRUCTIONS 160 - 167 (continued) 

Instructions 140-145 in process in Second Vector 
Logical unit. Unit busy (VL) + 4 CPs. 

For instructions 160, 162, 164, and 166, Sj 
reserved (except SO) 

For instructions 161, 163, 165, and 167, Vj 
reserved as operand 

Instruction issue, 1 CP 

Vj and Vk ready in (VL) + 3 CPs if data 
available* 

Vi ready in (VL) + 12 CPs if data available* 

Unit ready, (VL) + 4 CPs if data available* 

(Sj)=0 if j=0. 



f Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 170 - 173 



CAL 


Syntax 


Description 




Octal Code 


vi 


Sj+FVfc 


Floating-point sums of (Sj) and (Vfc elements) 
to Vi elements 


170ijfc 


vi 


+FV/ct 


Transmit normalized (Vk elements) to Vi 
elements 


170i0fc 


vi 


Vj+FVk 


Floating-point sums of (Vj elements) and 
(Vk elements) to Vi elements 


lllijk 


vi 


Sj-FVk 


Floating-point differences of 
(Vk elements) to Vi elements 


( S J ) and 


17 2 ijk 


vi 


-FVfcf 


Transmit normalized negatives 
(Vk elements) to Vi elements 


of 


172i0k 


vi 


Vj-FVk 


Floating-point differences of (Vj elements) 
and (Vk elements) to Vi elements 


173 ijk 



t Special CAL syntax 



Floating-point Add functional unit executes instructions 170 through 
173. Instructions 170 and 171 perform floating-point addition; 
instructions 172 and 173 perform floating-point subtraction. The number 
of additions or subtractions performed by an instruction is determined by 
the VL register contents. All operations start with element of the V 
registers and increment the element number by 1 for each operation 
performed. All results are delivered to Vi normalized and results are 
normalized even if the operands are not normalized. 

Instructions 170 and 172 deliver a copy of (Sj) to the functional unit 
where it remains as one of the operands until the completion of the 
operation. The other operand is an element of Vk. For instructions 
171 and 17 3, both operands are obtained from V registers. Section 4 
describes the out-of -range conditions. 



HOLD ISSUE CONDITIONS: V* reserved as operand 

Vi reserved as operand or result 
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INSTRUCTIONS 170 - 173 (continued) 

HOLD ISSUE CONDITIONS: Instructions 170 through 17 3 in process, unit 
(continued) busy (VL) + 4 CPs* 

For instructions 170 and 172, Sj reserved 
(except SO) 

For instructions 171 and 173, Vj reserved as 
operand 

EXECUTION TIME: Instruction issue, 1 CP 

Vj and Vk ready in (VL) + 3 CPs if data 
available* 

Vi ready in (VL) + 11 CPs if data availablet 

Unit ready, (VL) + 4 CPs if data availablet 

SPECIAL CASES: (Sj)=0 if j=0. 



f Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTION 174 



CAL Syntax 



Description 



Octal Code 



Vi /HVj Floating-point reciprocal approximation of 174ij0 
(Vj elements) to Vi elements 



The Reciprocal Approximation functional unit executes instruction 174. 
The instruction forms an approximate value of the reciprocal of the 
normalized floating-point quantity in each element of Vj and enters the 
result into elements of Vi. The number of elements for which 
approximations are found is determined by VL register contents. 

Instruction 174 occurs in the divide sequence to compute the quotients of 
floating-point quantities as described in section 4 under Floating-point 
Arithmetic. 

The reciprocal approximation instruction produces results of 30 
significant bits. The low-order 18 bits are zeros. The number of 
significant bits can be extended to 48 using the reciprocal iteration 
instruction and a multiply. 



HOLD ISSUE CONDITIONS: Vi reserved as operand or result 

Vj reserved as operand 



EXECUTION TIME: 



SPECIAL CASES: 



Instruction 174 in process, unit busy for 
(VL) + 4 CPst 

Instruction issue, 1 CP 

Vj ready in (VL) + 3 CPs if data availablet 

Vi ready in (VL) + 19 CPs if data availablet 

Unit ready, (VL) + 4 CPs if data available* 

(Vi element) is meaningless if (Vj element) 
is not normalized; the unit assumes that 
bit 2^7 of (Vj element) is 1; no test of this 
bit is made. 



f Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 174ijl - 174ij2 



CAL 


Syntax 




1 


Descript 


.ion 








Octal Code 


Vi 


PVj 


Population 
elements 


count 


of 


(VJ 


elements) 


to 


Vi 




174ijl 


vi 


QVJ 


Population 


count 


par 


ity 


of (Vj elements) 


to 


17 4 ij 2 






Vi elements 



















The Vector Population/Parity functional unit executes instructions 
174ijl and 174ij2, sharing some logic with the Reciprocal 
Approximation functional unit. 

Instruction 174ijl counts the number of bits set to 1 in each element 
of Vj and enters the results into corresponding elements of Vi. The 
results are entered into the low-order 7 bits of each Vi element; the 
remaining high-order bits of each Vi element are zeroed. 

Instruction 174ij2 counts the number of bits set to 1 in each element 
of Vj. The least significant bit of each element result shows whether 
the result is an odd or even number. Only the least significant bit of 
each element is transferred to the least significant bit position of the 
corresponding element of register Vi. The remainder of the element is 
set to zeros. The actual population count results are not transferred. 



HOLD ISSUE CONDITIONS: Vi reserved as operand or result 

Vj reserved as operand 

Instructions 174xxl and 174xx2 in process/ 
unit busy for (VL) + 4 CPs' 

Instruction 174xx0 in process, unit busy for 
(VL) + 9 CPst 

Instruction 070 in process, unit busy (070 issue 
time) + 7 CPst 



f Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTIONS 174ijl - 174ij2 (continued) 
EXECUTION TIME: Instruction issue, 1 CP 

Vj ready in (VL) + 3 CPs if data avail ablet 
Vi ready in (VL) + 10 CPs if data availablet 
Unit ready, (VL) + 4 CPs if data availablet 



Vector instructions may or may not start execution immediately; they 
execute as data becomes available. In particular, a memory conflict 
that slows execution of some elements of a vector load can cause 
delays in all instructions in the operation chain, starting with that 
load. 
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INSTRUCTION 175 



CAL Syntax 


Description 


Octal Code 


VM Vj,Z 


VM=1 when (Vj element) =0 


17 50 JO 


VM Vj,N 


VM=1 when (Vj element) ^0 


1750J1 


VM Vj,P 


VM=1 when (Vj element) positive, 
(bit 2 63 =0), includes (Vj element) =0 


1750J2 


VM Vj,M 


VM=1 when (Vj element) negative, 
(bit 2 63 =1) 


17 50J3 


Vi,VM vj,z 


VM=1 and (Vi compress element) =element 
index when (Vj element) =0 


175ij4 


Vi,VM VJ,N 


VM=1 and (Vi compress element) =element 
index when (Vj element) ^0 


175ij5 


Vi,VM VJ,P 


VM=1 and (Vi compress element) =element 
index when (Vj element) positive, 
(bit 2 63 =0), includes (Vj element) =0 


175ij6 


Vi, VM VJ,M 


VM=1 and (Vi compress element) =element 
index when (Vj element) negative, 
(bit 2 63 =1) 


175ij7 



The Full Vector Logical functional unit executes the vector mask and 
compress index instruction 175. 

Instruction 1750J&, where k=0 through 3, creates a vector mask in VM 
based on the results of testing the contents of the elements of register 
Vj. Each bit of VM corresponds to an element of Vj. Bit 2 63 
corresponds to element 0; bit 2^ corresponds to element 63. 

Instruction 17 5ijk, where &=4 through 7, creates an identical vector 
mask as in 1750j.fc and in addition creates a compressed index list in 
register Vi based on the results of testing the contents of the 
elements of register Vj (Refer to example). 

The type of test made by the instruction depends on the low-order 2 bits 
of the k designator. The high-order bit of the k designator is used to 
select the compress index option. 

If the k designator is 0, the VM bit is set to 1 when (Vj element) is 
and is set to when (Vj element) is nonzero. 
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INSTRUCTION 175 (continued) 

If the k designator is 1, the VM bit is set to 1 when (Vj element) is 
nonzero and is set to when (Vj element) is 0. 

If the k designator is 2, the VM bit is set to 1 when (Vj element) is 
positive and is set to when (Vj element) is negative. A zero value 
is considered positive. 

If the k designator is 3, the VM bit is set to 1 when (Vj element) is 
negative and is set to when (Vj element) is positive. A zero value 
is considered positive. 

If the k designator is 4, the VM bit is set to 1 and register (Vi 
compress element) is set to Vj element index when (Vj element) is 0. 
Register Vi elements are written to and Vi element pointer advanced 
only when (Vj element) is 0. 

If the k designator is 5, the VM bit is set to 1 and register (Vi 
compress element) is set to Vj element index when (Vj element) is 
nonzero. Register Vi elements are written to and Vi element pointer 
advanced only when (Vj element) is nonzero. 

If the k designator is 6, the VM bit is set to 1 and register (Vi 
compress element) is set to Vj element index when (Vj element) is 
positive. Register Vi elements are written to and Vi element pointer 
advanced only when (Vj element) is positive. A zero value is 
considered positive. 

If the k designator is 7, the VM bit is set to 1 and register (Vi 
compress element) is set to Vj element index when (Vj element) is 
negative. Register Vi elements are written to and Vi element pointer 
advanced only when (Vj element) is negative. 

The number of elements tested is determined by the VL register contents 
VM bits corresponding to untested elements of Vj are zeroed. 

Vector mask instruction 175 jk, k=Q through 3, and compress index 
instruction 175ijk, fc=4 through 7, provide a vector counterpart to 
the scalar conditional branch instructions. 



HOLD ISSUE CONDITIONS: Vj reserved as operand 

Instruction 14x in process, unit busy 
(VL) + 4 CPs 

Instruction 175 in process, unit busy 
(VL) + 4 CPs 

For instruction 175 (fc=4 through 7), if 
register Vi reserved as operand or result. 
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EXECUTION TIME: 



SPECIAL CASES: 



INSTRUCTION 175 (continued) 

Instruction issue, 1 CP 

Vj ready, (VL) +3 CPs if data available 

For instruction 175 (k=4 through 7), Vi ready 
in (VL) + 10 CPs if data is available. 

Except for instruction 073, VM ready (VL) + 4 CPs 
if data available 

For instruction 073, VM ready (VL) + 5 CPs if 
data available 

k=0 or 4, VM bit xx=l if (Vj element xx)=0. 

k=l or 5, VM bit xx=l if (Vj element xx)*0. 

k=2 or 6, VM bit xx=l if (Vj element xx) is 
positive; is a positive condition. 

*=3 or 7, VM bit xx=l if (Vj element xx) is 
negative . 

k=4, (Vi compress element) =xx if (Vj element 
xx)=0. 

k=S f (Vi compress element )=xx if (Vj element 
xx) ^0. 

k=6 , (Vi compress element )=xx if (Vj element 
xx) is positive; is a positive condition. 

k=l , (Vi compress element) =xx if (Vj element 
xx ) is negative. 

For instruction 175 (k=4 through 7), if no test 
conditions are true, then (VM)=0 and no writes to 
register Vi occur and the elements of Vi are 
unchanged by this instruction. 



HR-0097 



5-89 



INSTRUCTION 175 (continued) 



Example: 



This example of the compress index instruction 175ij"4 generates the same 
vector mask as instruction 1750J0 and also generates data into vector 
register Vi as follows: 

Vector length=133 



Vector 
Element 



Register 
Vi Data 



Vector 
Element 



Register 
Vj Data 



00 


00 




00 


Zero 




01 


02 




01 


Nonzero 


02 


05 




02 


Zero 


03 


06 




03 


Nonzero 


04 


12 


^^^-^ "~~^ 


04 


Nonzero 


05 


Unchanged 


— -^^ 


05 


Zero 


06 


Unchanged 




06 


Zero 


. 


. 




07 


Nonzero 


. 


. 




10 


Nonzero 


. 


. 




11 


Nonzero 


• 


• 




12 


Zero 
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INSTRUCTIONS 176 - 177 



CAL 


Syntax 


Description 


Octal Code 


vi 


, AO , Afc 


Transmit (VL) words from memory to Vi 
elements starting at memory address (AO) and 
incrementing by (A&) for successive 
addresses 


176i0fc 


vi 


,A0,1 


Transmit (VL) words from memory to Vi 
elements starting at memory address (AO) and 
incrementing by 1 for successive addresses 


176i00 


vi 


,A0,V& 


Transmit (VL) words from memory to Vi 
elements using memory address (AO) + 
(Vk elements) 


176il* 


,A0 


rkk Vj 


Transmit (VL) words from Vj elements to 
memory starting at memory address (AO) and 
incrementing by (A/c) for successive 
addresses 


mojk 


,A0 


'1 VJ 


Transmit (VL) words from Vj elements to 
memory starting at memory address (AO) and 
incrementing by 1 for successive addresses 


1770J0 


,A0 


,V* Vj 


Transmit (VL) words from Vj elements to 
memory using memory address (AO) + 
(Vfc elements) 


1771J/C 



Instructions 176 and 177 transfer blocks of data between V registers and 
memo ry . 

Instruction 176 transfers data from memory to elements of register Vi . 

Instruction 177 transfers data from elements of register Vj to memory. 

For instructions 176i0& and lllQjk, register elements begin with 

and are incremented by 1 for each transfer. Memory addresses begin with 

(AO) and are incremented by the contents of hk. hk contains a signed 

24-bit integer which is added to the address of the current word to 

obtain the address of the next word. hk can specify either a positive 

or negative increment allowing both forward and backward streams of 

reference. 

The number of words transferred is determined by the VL register contents 
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INSTRUCTIONS 176 - 177 (continued) 

For instructions 176ilfc and lllljk, register elements begin with 
and are incremented by 1 for each transfer. The low-order 24 bits of 
each element of Vk contains a signed 24-bit integer which is added to 
(AO) to obtain the current memory address. 

The number of words transferred is determined by the VL register contents 



HOLD ISSUE CONDITIONS: For instruction 176 if Ports A and B busy 

For instruction 177 if Port C busy 

For instructions 176ilfc and lllljk, if 
116ilk or lllljk in progress 

AO reserved 

For instructions 176i0& and lllOjk, if A* 
reserved where k=l through 7 

Scalar reference in CP1, CP2, CP3, or CP4 

For instruction 176, V register i reserved as 
operand or result 

For instruction 177, V register j reserved as 
operand 

For instruction 176ilfc and lllljk, V register 
k reserved as operand 

If not bidirectional memory mode, then 
instruction 176 holds on Port C busy and 
instruction 177 holds on Port A or B busy. 

EXECUTION TIME: For instruction 176i0&: 

Instruction issue, 1 CP 

Vi ready, (VL) + 17 CPs if memory is available 

Port A or B busy, (VL) + 6 CPs 

For instruction lllOjk: 
Instruction issue, 1 CP 

Vj ready, (VL) + 3 CPs if data is available 
Port C busy, (VL) + 7 CPs 

For instruction 176il&: 
Instruction issue, 1 CP 
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INSTRUCTIONS 176 - 177 (continued) 



EXECUTION TIME: 
(continued) 



SPECIAL CASES: 



Vi ready, (VL) + 21 CPs if memory is available 
V* ready, (VL) + 3 CPs if data is available 
Port A or B busy, (VL) + 10 CPs 
176ilfc busy, (VL) + 10 CPs 

For instruction lllljkz 
Instruction issue, 1 CP 

Vi and Vk ready, (VL) + 3 CPs if data is 
available 

Port C busy, (VL) + 10 CPs 
mijk busy, (VL) + 10 CPs 

For instructions 176i0/c and 1770jk, 
increment (A0)=1 if k=0. 



Instruction 176 uses Port B. If Port B is busy 
at issue time, instruction 176 uses Port A and 
instruction 177 uses Port C. 

For instructions 176i0& and 1770J&: 
(Ak) determines the memory increment. 
Successive addresses are located in successive 
banks. References to the same bank can be made 
every 4 CPs or more. Incrementing (kk) by 64 
places successive memory references in the same 
bank, so a word is transferred every 4 CPs or 
more. If the address is incremented by 32, 
every other reference is to the same bank, and 
words can transfer no faster than one every 2 
CPs. With any address incrementing that allows 
4 CPs before addressing the same bank, the 
words can transfer each CP. 

Memory conflict can slow loading or storing of 
individual vector elements. The elements are 
loaded or stored in order, so any delay for any 
element delays all succeeding elements. 

For instruction 176: 

If there is an instruction using its 
destination register as a source, the execution 
of that instruction is delayed whenever there 
is a delay in instruction 176 results. 
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APPENDIX SECTION 



INSTRUCTION SUMMARY FOR 
CRAY X-MP FOUR-PROCESSOR 
COMPUTER SYSTEMS 



Instructions for the CRAY X-MP four-processor computer systems are listed 
in numerical order on the following pages. The following abbreviations 
are used: 

Abbreviation Definition 

Pop/LZ Scalar Population/Parity/Leading Zero functional unit 

A Int Add Address Add functional unit 

A Int Mult Address Multiply functional unit 

S Logical Scalar Logical functional unit 

S Shift Scalar Shift functional unit 

S Int Add Scalar Add functional unit 

Fp Add Floating-point Add functional unit 

Fp Mult Floating-point Multiply functional unit 

Fp Rcpl Reciprocal Approximation functional unit 

V Logical Vector Logical functional unit 
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A-l 



CRAY X-MP 



CAL 



Unit 



Description 



000000 
OOlOj&f 



OOlljfcf 
0012j0f 

0012jlf 



ERR 
CA,Aj A* 



CL,AJ kk 
CI,Aj 

MC,Aj 



0013j'0f 


XA AJ 


0014j0f 


RT Sj 


0014jlf 


IP,jl 


001402f 


IP 


001403f 


CLN 


001413f 


CLN 1 


001423f 


CLN 2 


001433f 


CLN 3 


001443f 


CLN 4 


001453f 


CLN 5 


0014j4f 


PCI Sj 


001405f 


CCI 


001406f 


ECI 


001407f 


DC I 


0015j0f 


ft 


001501f 


ft 


001511f 


ft 


001521f 


ft 


001531f 


ft 


00200* 


VL A* 


002000fff 


VL 1 


002100 


EFI 


002200 


DFI 


002300 


ERI 


002400 


DRI 


002500 


DBM 



Error exit 

Set the channel (Aj) current 

address to (Ak) and begin 

the I/O sequence 

Set the channel (Aj) limit 

address to (Ak) 

Clear Channel (Aj) Interrupt 

flag; clear device 

master-clear (output channel). 

Clear Channel (Aj) Interrupt 

flag; set device master-clear 

(output channel); clear device 

ready-held (input channel). 

Enter XA register with (Aj) 

Enter RTC register with (Sj) 

Set interprocessor interrupt 

Clear interprocessor interrupt 

Enter CLN register with 

Enter CLN register with 1 

Enter CLN register with 2 

Enter CLN register with 3 

Enter CLN register with 4 

Enter CLN register with 5 

Enter II register with (Sj) 

Clear PCI request 

Enable PCI request 

Disable PCI request 

Select performance monitor 

Set maintenance read mode 

Load diagnostic checkbyte 

with SI 

Set maintenance write mode 1 

Set maintenance write mode 2 

Transmit (Ak) to VL register 

Transmit 1 to VL register 

Enable interrupt on 

floating-point error 

Disable interrupt on 

floating-point error 

Enable operand range interrupts 

Disable operand range 

interrupts 

Disable bidirectional memory 

transfers 



f Privileged to monitor mode 
•f-f" Not currently supported 
ttt Special CAL syntax 
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CRAY X-MP 


CAL 




Unit 


002600 


EBM 






- 


002700 


CMR 






_ 


0030J0 


VM 


SJ 




- 


003000f 


VM 







- 


0034jfc 


SMjfc 1,TS 




- 


0036 jk 


SM jk 




- 


0037jk 


SMjfc 1 




- 


004000 


EX 






- 


0050jfc 


J 1 


Sjk 




- 


006 ij km 


J < 


sxp 




- 


007 ij km 


R < 


sxp 




- 


010 ij km 


JAZ 


exp 




- 


Ollijkm 


JAN 


exp 




- 


12 ijkm 


JAP 


exp 




- 


13 ij km 


JAM 


exp 




- 


014 ijkm 


JSZ 


exp 




- 


015 ijkm 


JSN 


exp 




- 


016 ijkm 


JSP 


exp 




- 


Oil ijkm 


JSM 


exp 




- 


Olhijkm 


kh 


exp 




- 


020 ijkm 


hi 


exp 




- 


021 ijkm 


Ai 


exp 




- 


022 ijk 


Ai 


exp 




- 


023 i j'O 


Ai 


SJ 




- 


023i01 


Ai 


VL 




- 


024ijfc 


Ai 


Bjk 




- 


025ijfc 


Bjk 


Ai 




- 


026ij0 


Ai 


PSJ 


Pop/LZ 


026ijl 


Ai 


QSj 


Pop/LZ 


026ij7 


Ai 


SBj 




- 


027ij0 


Ai 


zsj 


Pop/LZ 


027IJ7 


SBj 


Ai 




- 


30 ijfc 


Ai 


kj+kk 


A 


Int Add 


030i0*t 


Ai 


kk 


A 


Int Add 


030ij0f 


Ai Aj+1 


A 


Int Add 


031ijfc 


Ai 


kj-kk 


A 


Int Add 



Description 

Enable bidirectional memory 

transfers 

Complete memory references 

Transmit (Sj) to VM register 

Clear VM register 

Test & set semaphore jk in SM 

Clear semaphore jk in SM 

Set semaphore jk in SM 

Normal exit 

Jump to (Bjk) 

Jump to exp 

Return jump to exp; set BOO 

to P. 

Branch to exp if (A0)=0 

Branch to exp if (A0)^0 

Branch to exp if (A0) 

positive; is positive. 

Branch to exp if (A0) negative 

Branch to exp if (S0)=0 

Branch to exp if (S0)^0 

Branch to exp if (SO) 

positive; is positive. 

Branch to exp if (SO) negative 

Transmit exp-ijkm to Ah 

Transmit exp=jkm to Ai 

Transmit exp=ones complement 

of j km to Ai 

Transmit exp=jk to Ai 

Transmit (Sj) to Ai 

Transmit (VL) to Ai 

Transmit (Bjk) to Ai 

Transmit (Ai) to Bjk 

Population count of (Sj) 

to Ai 

Population count parity of 

(Sj) to Ai 

Transmit (SBj) to Ai 

Leading zero count of (Sj) 

to Ai 

Transmit (Ai) to SBj 

Integer sum of (kj) and 

(A/c) to Ai 

Transmit (A*) to Ai 

Integer sum of (Aj) and 1 to 

Ai 

Integer difference of (Aj) 

less (kk) to Ai 



f Special CAL syntax 
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CRAY X-MP 



CAL 



Unit 



Description 



031i00f 
031i0*f 

031ij0f 

032 ijk 

033200 
03 3 i JO 

033ijl 

034ijx 

034ij*f 

035ij* 

035ij*f 

036ij* 

036 ijkf 

37 ijk 

037 ijkf 

040 i j km 
OZlijkm 

042ij* 
042ij*f 



042i77f 

042i00f 
043ij7c 



043ij*t 



hi -1 
hi -hk 

hi Aj-1 

hi hj*hk 

hi CI 
hi Ch,hj 

hi CE , Aj 

Bjk,hi ,A0 

Bjk,hi 0,A0 

,A0 Bjk,hi 

0,h0 Bjk e hi 

Tjk,hi ,A0 

Tjk,hi 0,h0 

,A0 Ijk, hi 

0,A0 Ijk, hi 

Si exp 
Si exp 

Si <exp 
Si #>exp 



Si 1 
Si -1 
Si >exp 



Si #<exp 



A Int Add 
A Int Add 

A Int Add 

A Int Mult 



Memory 
Memory 
Memory 
Memory 
Memory 
Memory 
Memory 
Memory 



S Logical 
S Logical 



S Logical 
S Logical 
S Logical 



S Logical 



Transmit -1 to Ai 

Transmit the negative of 

(A*) to Ai 

Integer difference of (Aj) 

less 1 to Ai 

Integer product of (Aj) and 

(hk) to Ai 

Channel number to Ai (j=0) 

Address of channel (Aj) to 

Ai (j*0; k=0) 

Error flag of channel (Aj) 

to Ai (j*0; k=l) 

Read (Ai) words to B register 

jk from (A0) 

Read (Ai) words to B register 

jk from (A0) 

Store (Ai) words at B register 

jk to (A0) 

Store (Ai) words at B register 

jk to (A0) 

Read (Ai) words to T register 

jk from (A0) 

Read (Ai) words to T register 

jk from (A0) 

Store (Ai) words at T register 

jk to (A0) 

Store (Ai) words at T register 

jk to (A0) 

Transmit jkm to Si 

Transmit exp=ones complement 

of jkm to Si 

Form ones mask exp bits in Si 

from the right; jk field 

gets 64 - exp. 

Form zeros mask exp bits in 

Si from the left; jk field 

gets 64 - exp. 

Enter 1 into Si 

Enter -1 into Si 

Form ones mask exp bits in Si 

from the left; jk field 

gets exp. 

Form zeros mask exp bits in 

Si from the right; jk field 

gets 64 - exp. 



f Special CAL syntax 
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CRAY X-MP 



CAL 



Unit 



Description 



043i00f 
044ij* 

044ij0f 
044ij'0f 
0045ijfc 



45ij0f 

046ijfc 

046ij0f 

046ij0f 

OAlijk 

047i0fcf 
047ij"0f 
047ij0f 

047i00f 
OSOijk 



Si 
Si Sj'&Sfc 

Si Sj&SB 
Si SB&Sj 
Si #S*&Sj 



Si # SB&Sj 
Si Sj\S& 
Si Sj\SB 
Si SB\Sj 
Si #Sj\Sfc 
Si #S& 
Si #Sj SB 
Si #SB Sj 
Si #SB 
Si Sj!Si&S& 



S Logical 
S Logical 

S Logical 
S Logical 
S Logical 



S Logical 
S Logical 
S Logical 
S Logical 
S Logical 
S Logical 
S Logical 
S Logical 
S Logical 
S Logical 



050ij0f 


Si 


Sj!Si&SB 


S 


Logical 


051ij& 


Si 


Sj ! Sk 


S 


Logical 


051i0fcf 


Si 


S* 


S 


Logical 


051ij0f 


Si 


S j ! SB 


S 


Logical 


051ij0f 


Si 


SB ! S j 


S 


Logical 


051i00f 


Si 


SB 


S 


Logical 


052ijk 


SO 


Si<exp 


s 


Shift 


53 ijk 


SO 


Si>exp 


s 


Shift 


054ijk 


Si 


Si<exp 


s 


Shift 



Clear Si 

Logical product of (Sj) and 

(Sk) to Si 

Sign bit of (Sj) to Si 

Sign bit of (Sj) to Si (j*0) 

Logical product of (Sj) and 

ones complement of (Sk) to 

Si 

(Sj) with sign bit cleared 

to Si 

Logical difference of (Sj) 

and (Sk) to Si 

Toggle sigh bit of Sj, then 

enter into Si 

Toggle sign bit of Sj, then 

enter into Si (j*0) 

Logical equivalence of (Sk) 

and (Sj) to Si 

Transmit ones complement of 

(Sk) to Si 

Logical equivalence of (Sj) 

and sign bit to Si 

Logical equivalence of (Sj) 

and sign bit to Si (j^O) 

Enter ones complement of sign 

bit into Si 

Logical product of (Si) and 

(Sk) complement ORed with 

logical product of (Sj) and 

(Sfc) to Si 

Scalar merge of (Si) and 

sign bit of (Sj) to Si 

Logical sum of (Sj) and (S&) 

to Si 

Transmit (Sk) to Si 

Logical sum of (Sj) and sign 

bit to Si 

Logical sum of (Sj) and sign 

bit to Si (j*0) 

Enter sign bit into Si 

Shift (Si) left exp=jk places 

to SO 

Shift (Si) right exp=64 - jk 

places to SO 

Shift (Si) left exp=jk places 



f Special CAL syntax 
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CRAY X-MP 



CAL 



Unit 



Description 



055 ijk 
056 ijk 
056ij0f 
056i0*f 
57 ij* 
057ij0f 

057i0*f 
60 ij* 
061 ijk 
061i0*f 
062 i jk 

062i0*f 
06 3 ij* 

063i0*f 

064ij* 

065ij& 

066ijfc 

067 ijk 
OlOijQ 
OlliOk 
Ollilk 
071i2k 



Si Si>exp S Shift Shift (Si) right exp=64 - jk 

places 
Si Si,Sj<kk S Shift Shift (Si and Sj) left 

(kk) places to Si 
Si Si,Sj<l S Shift Shift (Si and Sj) left one 

place to Si 
Si Si<A£ S Shift Shift (Si) left (kk) places 

to Si 
Si Sj,Si>Ak S Shift Shift (Sj and Si) right (kk) 

places to Si 
Si Sj,Si>i S Shift Shift (Sj and Si) right one 

place to Si 
Si Si>Ak S Shift Shift (Si) right (kk) places 

to Si 
Si Sj+Sk S Int Add Integer sum of (Sj) and (Sk) 

to Si 
Si Sj-Sk S Int Add Integer difference of (Sj) and 

(Sfc) to Si 
Si -Sk S Int Add Transmit negative of (Sk) 

to Si 
Si Sj+FSfc Fp Add Floating-point sum of (Sj) and 

(Sk) to Si 
Si +FSk Fp Add Normalize (Sk) to Si 

Si Sj-FSk Fp Add Floating-point difference of 

(Sj) and (Sk) to Si 
Si -FS& Fp Add Transmit normalized negative 

of (S&) to Si 
Si Sj*FS& Fp Mult Floating-point product of (Sj) 

and (Sk) to Si 
Si Sj*HS& Fp Mult Half-precision rounded 

floating-point product of (Sj) 

and (Sk) to Si 
Si Sj*RSk Fp Mult Full-precision rounded 

floating-point product of (Sj) 

and (Sk) to Si 
Si Sj*ISfc Fp Mult 2-floating-point product of 

(Sj) and (S*) to Si 
Si /HSj Fp Rcpl Floating-point reciprocal 

approximation of (Sj) to Si 
Si A& - Transmit (kk) to Si with no 

sign extension 
Si +kk - Transmit (kk) to Si with 

sign extension 
Si +FA& - Transmit (kk) to Si as 

unnormalized floating-point 

number 



t Special CAL syntax 
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CRAY X-MP 



CAL 



Unit 



Description 



071i30 



Si 0.6 



071140 


Si 


0.4 


071i50 


Si 


1. 


071i60 


Si 


2. 


071i70 


Si 


4. 


072100 


Si 


RT 


072i02 


Si 


SM 


072ij"3 


Si 


STJ 


073i00 


Si 


VM 


073111 


t 




073121 


t 




073131 


t 




073101 


Si 


SRO 


073102 


SM 


Si 


0731J3 


STj 


Si 


0741 J* 


Si 


Tjk 


07 51 jk 


Ijk 


Si 


07 6 ijk 


Si 


Vj,hk 



onijk 



Vl , A* S j 



077i0fcff 


Vi , A* 


- 


lOhij&m 


Ai exp,kh 


Memo ry 


lOOijfonff 


ki exp,0 


Memo ry 


lOOijfcmff 


ki exp. 


Memory 


lOhiOOOff 


ki ,kh 


Memory 


llhijkm 


exp,kh ki 


Memory 


HOijkmff 


exp,o Ai 


Memory 


llOijfanff 


exp, ki 


Memo ry 


llhiOOOff 


,kh ki 


Memory 


12hijkm 


Si exp, Ah 


Memory 


120ijkmff 


Si exp,0 


Memory 


120 ijkmff 


Si exp,0 


Memory 


12/iiOOOft 


Si ,kh 


Memo r y 


13 hi jkm 


exp, Ah Si 


Memory 


130ijfcmff 


exp,0 Si 


Memory 


130ijfanft 


exp. Si 


Memory 


13hi000ff 


,Ah Si 


Memory 


1401 jk 


vi sj&vfc 


V Logical 


141ijfc 


vi vj&vfc 
rently supported 


V Logical 


f Not curi 




ft Special 


CAL syntax 





Transmit constant 0.75*2**48 

to Si 

Transmit constant 0.5 to Si 

Transmit constant 1.0 to Si 

Transmit constant 2.0 to Si 

Transmit constant 4.0 to Si 

Transmit (RTC) to Si 

Transmit (SM) to Si 

Transmit (STj) to Si 

Transmit (VM) to Si 

Read counter into Si 

Increment performance counter 

(maintenance) 

Clear all maintenance modes 

Transmit (SRO) to Si 

Transmit (Si) to SM 

Transmit (Si) to STj 

Transmit (Tjk) to Si 

Transmit (Si) to Ijk 

Transmit (Vj, element (kk)) 

to Si 

Transmit (Sj) to Vi element 

(A*) 

Clear Vi element (Afc) 

Read from ((Ah) + exp) to Ai 

(A0=0) 

Read from (exp) to Ai 

Read from (exp) to Ai 

Read from (Ah) to Ai 

Store (Ai) to (Ah) + exp (A0=0) 

Store (Ai) to exp 

Store (Ai) to exp 

Store (Ai) to (Ah) 

Read from ((Ah) + exp) to 

Si (A0=0) 

Read from exp to Si 

Read from exp to Si 

Read from (Ah) to Si 

Store (Si) to (Ah) + exp (A0=0) 

Store (Si) to exp 

Store (Si) to exp 

Store (Si) to (Ah) 

Logical products of (Sj) and 

(Vk) to Vi 

Logical products of (Vj) and 

(Vk) to Vi 
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CRAY X-MP 



CAL 



Unit 



Description 



14 2 ij* 



Vi SjWk 



142i0*f 
14 3 ij* 


Vi 
Vi 


V* 
Vj!V& 


144ijfc 


vi 


Sj v* 


14 5 ij* 


vi 


VJ* V* 


145iiif 
14 6 ijk 


vi 
vi 



Sj!Vfc&VM 


146i0*f 


vi 


#VM&Vfc 


147 i jk 


vi 


Vj!Vfc&VM 


150ijk 


vi 


vj<kk 


150ij0f 


vi 


VJ<1 


151ij* 


Vi 


Vjykk 


151ij0f 


vi 


Vj>l 


152 ijk 


vi 


vj,vj<kk 


152ij0f 


vi 


Vj,Vj<l 


153 ijk 


vi 


vj,vj>kk 


153ij0f 


vi 


vj,vj>i 


1 54 ij* 


vi 


SJ+V& 


155ij* 


vi 


Vj+Vk 


1 56 ijk 


vi 


Sj-Vk 


156i0*f 


vi 


-Vk 


157ij* 


vi 


Vj-Vk 


160ij* 


vi 


Sj*FVk 


161ij* 


vi 


Vj*FVk 



V Logical Logical sums of (Sj) and (V&) 

to Vi 

V Logical Transmit (V&) to Vi 

V Logical Logical sums of (Vj) and (V*) 

to Vi 

V Logical Logical differences of (Sj) 

and (V*) to Vi 

V Logical Logical differences of (Vj) 

and (V*) to Vi 

V Logical Clear Vi 

V Logical Transmit (Sj) if VM bit=l; 

(Vk) if VM bit=0 to Vi. 

V Logical Vector merge of (Vk) and 

to Vi 

V Logical Transmit (Vj) if VM bit=l; 

(V*) if VM bit=0 to Vi. 

V Shift Shift (Vj) left (A*) places 

to Vi 

V Shift Shift (Vj) left one place 

to Vi 

V Shift Shift (Vj) right (A*) places 

to Vi 

V Shift Shift (Vj) right one place 

to Vi 

V Shift Double shift (Vj) left (A*) 

places to Vi 

V Shift Double shift (Vj) left one 

place to Vi 

V Shift Double shift (Vj) right (kk) 

places to Vi 

V Shift Double Shift (Vj) right one 

place to Vi 

V Int Add Integer sums of (Sj) and (Vk) 

to Vi 

V Int Add Integer sums of (Vj) and (Vk) 

to Vi 

V Int Add Integer differences of (Sj) 

and (Vk) to Vi 

V Int Add Transmit negative of (Vk) 

to Vi 

V Int Add Integer differences of (Vj) 

and (Vk) to Vi 
Fp Mult Floating-point products of 

(Sj) and (V*) to Vi 
Fp Mult Floating-point products of 

(Vj) and (Vk) to Vi 



f Special CAL syntax 
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CRAY X-MP 


CAL 


■ 


16 2 ijk 


vi 


Sj*HV* 


16 3 ijk 


vi 


VJ*HVfc 


164ij* 


Vi 


Sj*RVk 


165 ijk 


Vi 


Vj*RV& 


16 6 ijk 


Vi 


Sj*IVfc 


167 ijk 


Vi 


Vj*IVfc 


noijk 


Vi 


Sj+FVfc 


170i0*t 


vi 


+FV* 


171ij* 


vi 


Vj+FVk 


17 2 ijk 


vi 


Sj-FVfc 


172i0*f 


vi 


-FVk 


17 3 ijk 


vi 


Vj-FVk 


174ij0 


vi 


/HVj 


174ijl 


vi 


PVj 


174ij2 


vi 


QVJ 


17 50 JO 


VM 


vj,z 


17 50J1 


VM 


Vj,N 


17 50J2 


VM 


Vj,P 


1750J3 


VM 


VJ,M 


17 5ij4 


vi, 


VM Vj,2 


175ij5 


vi, 


VM VJ,N 


175IJ6 


vi, 


VM VJ,P 


17 5ij7 


vi, 


VM Vj,M 


17610* 


vi 


, A0 , A* 



Unit Description 

Fp Mult Half-precision rounded 

floating-point products of 

(Sj) and (V*) to Vi 
Fp Mult Half-precision rounded 

floating-point products of 

(Vj) and (V*) to Vi 
Fp Mult Rounded floating-point products 

of (Sj) and (V*) to Vi 
Fp Mult Rounded floating-point products 

of (Vj) and (V*) to Vi 
Fp Mult 2-f loating-point products of 

(Sj) and (V*) to Vi 
Fp Mult 2-f loating-point products of 

(Vj) and (V*) to Vi 
Fp Add Floating-point sums of (Sj) 

and (V*) to Vi 
Fp Add Normalize (V*) to Vi 
Fp Add Floating-point sums of (Vj) 

and (V*) to Vi 
Fp Add Floating-point differences of 

(Sj) and (VA) to Vi 
Fp Add Transmit normalized negatives 

of (V/c) to Vi 
Fp Add Floating-point differences of 

(Vj) and (V*) to Vi 
Fp Rcpl Floating-point reciprocal 

approximations of (Vj) to Vi 

V Pop Population counts of (Vj) to Vi 

V Pop Population count parities of 

(Vj) to Vi 

V Logical VM=1 where (Vj)=0 

V Logical VM=1 where (Vj)*0 

V Logical VM=1 if (Vj) positive; is 

positive. 

V Logical VM=1 if (Vj) negative 

V Logical VM=1 and (Vi)=element index if 

(Vj)=0 

V Logical VM=1 and (Vi)=element index if 

(Vj)*0 

V Logical VM=1 and (Vi)=element index if 

(Vj) positive 

V Logical VM=1 and (Vi)=element index if 

(Vj) negative 
Memory Read (VL) words to Vi from 
(A0) incremented by (A*) 



f Special CAL syntax 
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CRAY X-MP 

176i00f 

176il* 

mojk 

1770j0f 
17 71 J* 



CAL 

Vi ,A0,1 
Vi f hO,Vk 
,hQ,hk Vj 
,A0,1 Vj 
,AO,Vfc Vj 



Unit Description 

Memory Read (VL) words to Vi from (AO) 

incremented by 1 
Memory Read (VL) words to Vi using 

(AO) + (V*) 
Memory Store (VL) words from Vj to 

(AO) incremented by (Ak) 
Memory Store (VL) words from Vj to 

(AO) incremented by 1 
Memory Store (VL) words from Vj using 

(AO) + (V*) 



f Special CAL syntax 
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6 MBYTE PER SECOND B 

CHANNEL DESCRIPTIONS 



Each input or output 6 Mbyte per second channel directly accesses Central 
Memory. Input channels store external data in memory and output channels 
read data from memory. A primary task of a channel is to convert 64-bit 
Central Memory words into 16-bit parcels or 16-bit parcels into 64-bit 
Central Memory words. Four parcels make up one Central Memory word with 
bits of the parcels assigned to memory bit positions (refer to section 2), 

Each input or output channel has a data channel (4 parity bits, 16 data 
bits, and 3 control lines), a 64-bit assembly or disassembly register, a 
channel Current Address (CA) register, and a channel Limit Address (CL) 
register. 

Three control signals (Ready, Resume, and Disconnect) coordinate the 
transfer of parcels over the channels. In addition to the three control 
signals, the output channel of the pair has a Master Clear line. 

This appendix describes the signal sequence of a 6 Mbyte per second input 
channel and an output channel. 



6 MBYTE PER SECOND INPUT CHANNEL SIGNAL SEQUENCE 

Table B-l shows a general view of a 6 Mbyte per second input channel 
signal sequence. The following paragraphs describe data bits, parity 
bits, and each signal in the sequence. 



DATA BITS 2° THROUGH 2 15 

Data bits 2^ through 2 1 ^ are signals carrying the 16-bit parcel of 
data from the external device to Central Memory. The data bits must all 
be valid within 2 5 ns after the leading edge of the Ready signal. Data 
bit signals must remain unchanged on the lines until the corresponding 
Resume signal is received by the external device. Normally, data is sent 
coincidentally with the Ready signal and is held until the subsequent 
Ready signal. 
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Table B-l. Input Channel Signal Exchange 



Central Memory 


Channel 


External Equipment 


1 


Activate channel 
(set CL and CA) 






2 


t 


♦- 


Data 2 63 - 2 48 with Ready 


3 


Resume 


-♦ 




4 




♦- 


Data 2 47 - 2 32 with Ready 


5 


Resume 


-* 




6 




*- 


Data 2 31 - 2 16 with Ready 


7 


Resume 


- 




8 




«- 


Data 2 15 - 2° with Ready 


9 


Write word to memory 
and advance 
current address. 






10a 


Resume 


-* 




10b 


If (CA)=(CL), 
go to step 13. 






11 






If more data, go to step 2. 


12 




4- 


Disconnect (ignored if 
CA=CL or if channel 
not active) 


13 


Set interrupt and 
deactivate channel 







f Step 2 can initially precede step 1; that is, the first parcel and 
ready signal can arrive before requested. 



PARITY BITS THROUGH 3 

Parity bits 0, 1, 2, and 3 are each assigned to a 4-bit group of data 
bits. The parity bits are set or cleared to give the bit group odd 
parity. Bit assignments follow. 
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Parity Bit Data Bits 

2° through 2 3 

1 2 4 through 2 7 

2 2 8 through 2 11 

3 2 12 through 2 15 

Parity bits are sent from the external device to Central Memory at the 
same time as data bits and are held stable in the same way as the data 
bits. 



READY SIGNAL 

The Ready signal sent to Central Memory indicates a parcel of data is 
being sent to the Central Memory input channel and can be sampled. A 
Ready signal is a pulse 50 +10 ns wide (at 50-percent voltage points). 
The leading edge of the Ready signal at Central Memory begins the timing 
for sampling the data bits. 



RESUME SIGNAL 

The Resume signal is sent from Central Memory to the external device 
showing that the parcel was received and Central Memory is ready for the 
next data transmission. A Resume signal is a pulse 50 +8 ns wide (at 
50-percent voltage points). 



DISCONNECT SIGNAL 

The Disconnect signal is sent from the external device to Central Memory 
and indicates transmission from the external device is complete. The 
Disconnect signal is sent after the Resume signal is received for the 
last Ready signal. A Disconnect signal is a pulse 50 +10 ns wide (at 
50-percent voltage points). 



6 MBYTE PER SECOND OUTPUT CHANNEL SIGNAL SEQUENCE 

Table B-2 shows a general view of a 6 Mbyte per second output channel 
signal sequence. The data bits, parity bits, and each signal in the 
sequence are described following the table. 
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Table B-2. Output Channel Signal Exchange 



Central Memory 


Channel 


External 


Equipment 


1 


Activate channel 
(set CL and CA) 








2 


Read word from 
memory and advance 
current address 








3 


Data 2 63 - 2 48 
with Ready 


-» 






4 




^ 


Resume 




5 


Data 2 47 - 2 32 
with Ready 


-♦ 






6 




*- 


Resume 




7 


Data 2 31 - 2 16 
with Ready 


-* 






8 




4— 


Resume 




9 


Data 2 15 - 2° 
with Ready 


-+ 






10 




«- 


Resume 




11 


If (CA)*(CL), 
go to step 2 








12 


Disconnect 


- 






13 


Set interrupt and 
deactivate channel 









DATA BITS 2° THROUGH 2 



15 



Data bits 2° through 2 1 ^ are signals carrying a 16-bit parcel of data 
from Central Memory to an external device. The data bits are sent 
concurrently within 5 ns of the leading edge of the Ready signal. Data 
bit signals remain steady on the lines until the Resume signal is 
received. 
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PARITY BITS THROUGH 3 

Parity bits 0, 1, 2, and 3 are each assigned to a 4-bit group of data 
bits. The parity bits are set or cleared to give the bit group odd 
parity. Bit assignments follow: 

Parity Bit Data Bits 

2° through 2 3 

1 2 4 through 2 7 

2 2 8 through 2 11 

3 2 12 through 2 15 

Parity bits are sent from Central Memory to the external device at the 
same time as the data bits and are held stable in the same way as the 
data bits. 



READY SIGNAL 

The Ready signal sent from Central Memory to the external device 
indicates data is present and can be sampled. A Ready signal is a pulse 
50 +8 ns wide (at 50-percent voltage points). The leading edge of the 
Ready signal can be used to time data sampling in the external device. 



RESUME SIGNAL 

The Resume signal is sent from the external device to Central Memory 
showing the parcel was received and the external device is ready for the 
next parcel transmission. A Resume signal is a pulse 50 +10 ns wide (at 
50-percent voltage points). 



DISCONNECT SIGNAL 

The Disconnect signal is sent from Central Memory to the external device 
and indicates transmission from Central Memory is complete. The 
Disconnect signal is sent after Central Memory receives the Resume signal 
from the last Ready signal. A Disconnect signal is a pulse 50 +8 ns wide 
(at 50-percent voltage points). 
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PERFORMANCE MONITOR 



The system contains a set of eight performance counters to track certain 
hardware related events that can be used to indicate relative 
performance. The events that can be tracked are the number of specific 
instructions issued/ hold issue conditions, the number of fetches, 
references, and so on, and are selected through instruction 0015j"0. 
Table C-l lists all operations that can be monitored. 

Performance monitoring instructions allow the user to select specific 
hardware related events for monitoring, read the results of the 
performance monitors into a scalar register, and test the operation of 
the performance counters. 

The instructions used for performance monitoring are: 

Octal Code Description 

0015J0 Select performance monitor 

073ill Read performance counter into Si 

073i21 Increment performance counter (maintenance) 
All instructions are executed in monitor mode. 



SELECTING PERFORMANCE EVENTS 

Instruction 0015J0 selects for monitoring one of the four groups of 
hardware-related events shown in table C-l and clears all performance 
monitors. The low-order 2 bits of the j field select the group. 

During each CP in nonmonitor (user) mode, the performance counters 
advance their totals according to the number of monitored events that 
occur. Each of the performance counters can increment at a maximum rate 
of +3 per CP. This allows a counter to continuously monitor for 
approximately 62 hours before it is reset. 

Performance events are monitored only while operating in user 
(nonmonitor) mode. Entering monitor mode disables advancing of the 
performance counters. 



HR-0097 C-l 



Table C-l. Performance Counter Group Descriptions 



Monitor 


Performance 




Increment 


Function 


Counter 


Description 


Per CP 






Number of: 









Instructions issued 


+1 




1 


CPs holding issue 


+1 




2 


Fetches 


+1 


j=0 


3 


I/O references 


+1 




4 


CPU references 


+ 3 max 




5 


Floating-point add operations 


+ 1 




6 


Floating-point multiply operations 


+ 1 




7 


Floating-point reciprocal operations 


+1 






Hold issue conditions: 









Semaphores 


+ 1 




1 


Shared registers 


+1 




2 


A registers and functionals 


+1 


J=l 


3 


S registers and functionals 


+1 




4 


V registers 


+1 




5 


V functional units 


+1 




6 


Scalar memory 


+ 1 




7 


Block memory 


+1 






Number of: 









Fetches 


+1 




1 


Scalar references 


+ 1 




2 


Scalar conflicts 


+ 1 


j = 2 


3 


I/O references 


+1 




4 


I/O conflicts 


+1 




5 


Block references 


+3 max 




6 


Block conflicts 


+3 max 




7 


Vector memory references 


+ 3 max 






Number of: 









000 - 017 instructions 


+ 1 




1 


020 - 137 instructions 


+ 1 




2 


140 - 157, 175 instructions 


+ 1 


j = 3 


3 


160 - 174 instructions 


+ 1 




4 


176, 177 instructions 


+ 1 




5 


Vector integer operations 


+3 max 




6 


Vector floating-point operations 


+ 3 max 




7 


Vector memory references 


+ 3 max 
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READING PERFORMANCE RESULTS 

Performance counter totals can be read using instruction 07311, which 
transmits either the high-order or low-order bits of a performance 
counter to the high-order bits of scalar register Si according to the 
contents of the performance counter pointer. 

Entering monitor mode disables advancing of all performance counters and 
clears the performance counter pointer. The first execution of a 
07 3111 instruction reads the low-order bits of counter into Si and 
increments the performance counter pointer. The second 073ill 
instruction reads the high-order bits of counter into Si and again 
increments the pointer. After each 073ill instruction, the performance 
counter pointer is advanced by 1. Even values of the pointer select the 
low-order bits of a performance counter to be read into Si; odd values 
of the pointer select the high-order bits of the performance counter to 
be read. 

Low-order bits through 2 5 of the performance counter are read into bits 
32 through 57 of Si. High-order bits 26 through 45 of the performance 
counter are read into bits 38 through 57 of Si. 

A sequence for reading a set of performance counters appears as follows 
(there must be a 2-CP delay between sequential 073ill instructions): 

Description 

Low-order bits of counter to Si 

2 CP delay 

High-order bits of counter 1 to Si 

2 CP delay 

Low-order bits of counter 1 to Si 

2 CP delay 

High-order bits of counter 2 to Si 

2 CP delay 



Step 


Octal Code 


1 


073ill 


2 




3 


073ill 


4 




5 


073ill 


6 




7 


073ill 


8 
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TESTING PERFORMANCE COUNTERS 

Instruction 073i21 is used to test the operation of the performance 
counters by incrementing the value stored in the counter while in monitor 
mode. 

Entering monitor mode disables advancing of all performance counters by 
user programs and clears the performance counter pointer. This pointer 
determines which performance counter, and which bits in that counter, are 
incremented. Even values of the pointer increment bits and 6 of the 
performance counter when instruction 073i21 is executed, odd values of 
the pointer increment bit 26. The pointer is advanced from even to odd 
and to the next counter through instruction 073ill. 

There must be a 1-CP delay between sequential 073i21 instructions. 

Execution of instruction 073i21 loads register Si with all ones as a 
side effect of the basic 073 instruction. 
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SECDED MAINTENANCE FUNCTIONS 



Modules involved with generating and interpreting the 8-bit check byte 
used for single-error-correction/double-error-detection (SECDED) include 
logic that can be used for verifying check bit storage, check bit 
generation, and error detection and correction. 

The instructions used for these maintenance mode functions are: 

Octal Code Description 

001501 Set maintenance read mode 

001511 Load diagnostic check byte with SI 

001521 Set maintenance write mode 1 

001531 Set maintenance write mode 2 

073i31 Clear all maintenance modes 

These instructions are all executed in monitor mode, and for instructions 
0015xx, the maintenance mode switch (located on the mainframe's control 
panel) must be on or the instructions become no-ops. 



VERIFICATION OF CHECK BIT STORAGE 

To verify the storage ability of the SECDED check bits without moving 
memory modules, instructions 001501 and 001521 are used. 

The maintenance write mode 1 instruction, 001521, replaces the 8 check 
bits generated by the SECDED circuitry with specific bits of a data word 
as it is written into memory. The maintenance read mode instruction, 
001501, complements the write instruction by replacing the same bits of a 
data word with the 8 check bits as it is read from memory. 

By using the instructions together (and with error correction disabled 
through the switch on the mainframe's control panel), specified bits of a 
data word are stored and read back through the check bit storage paths 
and verification of SECDED check bit storage operation is accomplished. 

Instruction 001521, maintenance write mode 1, and 001501, maintenance 
read mode, replace data bits with check bits and vice versa as follows. 
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Data Bit Check Bit 

46 

47 1 

62 2 

63 Read -► 3 

14 .<- Write 4 

15 5 

30 6 

31 7 



VERIFICATION OF CHECK BIT GENERATION 



The maintenance read mode instruction, 001501/ is used to verify the 
correct generation of SECDED check bits for a word of data. 

When the instruction is executed, the 8 check bits for SECDED replace 
specific data bits as the word is read into memory. A test program can 
easily extract these check bits and verify their correctness, thus 
checking the accuracy of the SECDED check bit circuitry. 

Since the CPU replaces the data bits with check bits on all reads to 
memory until instruction 073i31 is executed (including fetch, scalar 
and vector reads, and I/O for the CPU), the test program should initially 
rewrite all of memory using the 001501 instruction to set up the SECDED 
check bits for a subsequent read by fetch or I/O. 

Error correction must be disabled during this test. 



VERIFICATION OF ERROR DETECTION AND CORRECTION 

The maintenance write mode 2 instruction, 001531, and the load diagnostic 
check byte with SI instruction, 001511, are used to verify operation of 
the SECDED circuitry. 

To verify operation, a diagnostic check byte is initially loaded with the 
high-order bits of register SI through instruction 001511 as follows: 

SI Bit Diagnostic Check Bit 

56 

57 1 

58 2 

59 3 

60 4 

61 5 

62 6 

63 7 
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This diagnostic check byte is then written into memory in place of the 
normal SECDED check bits on any subsequent CPU write to memory (writes 
from I/O through this CPU are not affected) . With error correction 
enabled (through the switch on the mainframe's control panel), a 
subsequent read of the memory location allows different paths within the 
error detection and correction circuitry to be checked out. 

The diagnostic check byte retains its value until a new one is entered. 



CLEARING MAINTENANCE MODE FUNCTIONS 

Instruction 073231, clears all maintenance modes, clears the following 
maintenance mode instructions: 

Octal Code Description 

001501 Set maintenance read mode 

001521 Set maintenance write mode 1 

001531 Set maintenance write mode 2 

A Master Clear also clears the instructions. 

As a side effect of the 073i31 instruction, Si is loaded with all 
ones . 
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INDEX 



2-18 
2-16 



B-4 



1-parcel instruction format 

with combined j and k fields, 5-2 

with discrete j and * fields, 5-1 
100 Mbyte per second channel, 2-14 
1250 Mbyte per second channel, 2-14 
2-parcel instruction format 

with combined i, j, k, and Jfl fields, 5-4 

with combined j, k, and m fields, 5-3 
6 Mbyte per second channel, 2-15 

data bits, B-l, B-4 

descriptions, B-l 

disconnect signal, B-3, B-5 

input channel error conditions, 2-19 

input channel programming, 2-17 

input channel signal exchange, B-2 

input signal sequence, B-l 

instructions, 2-17 

I/O interrupts, 2-17 

I/O program flowchart, 

multi-CPU programming, 

operation, 2-17 

output channel signal, 

output programming, 2-19 

output signal sequence, B-3 

parity bits, B-2, B-5 

programmed master clear to external 
device, 2-19 

ready signal, B-3, B-5 

resume signal, B-3, B-5 

word assembly/disassembly, 2-17 
8-bit check byte, 2-9 
8-bit Status register, 4-8 



A registers, 4-3 

Access priorities, Central Memory, 2-7 

Access time, memory, 2-1 

Active Exchange Package, 3-14 

Addition algorithm, 4-28 

Addition, floating-point, 4-28 

Address Add functional unit, 4-14 

Address functional units, 4-14 

Address Multiply functional unit, 4-15 

Address processing, 4-1 

Address registers, 4-3 

Addressing, memory, 2-3 

Algorithm 

addition, 4-28 

derivation of division, 4-31 

division, 4-30 

multiplication, 4-28 
AND function, 4-36 



Arithmetic 

floating-point, 4-22 

integer, 4-21 

operations, 4-21 
Auxiliary I/O Processor (XIOP), 1-10 



B registers, 4-6 

Bank busy conflict, 2-7 

Banks, 2-1 

Beginning Address register, 3-3 

Bidirectional Memory Mode (BDM) flag, 3-9 

Bidirectional memory references, 2-4 

BIOP, see Buffer I/O Processor 

Block reads and writes, concurrent, 2-6 

Block transfer references, 2-5 

Branching, forward and backward, 3-4 

Buffer I/O Processor (BIOP), 1-10 

Buffers, instruction, 3-3 



CA register, see Current Address register 
Central Memory 

access ports, 2-5 

access priorities, 2-7 

error correction, 2-7 

access time, 2-1 

access, 2-7, 2-20 

addressing, 2-3 

banks, 2-1 

conflict resolution, 2-7 

cycle time, 2-1 

features, 2-1 

organization, 2-2 

ports, 2-3 

references per clock period, 2-2 

sections, 2-2 

size, 2-1 

transfer rates, 2-1 

types of conflict, 2-5 

word size, 2-1 
Central Processing Unit 

computation section, 4-1 

control and data paths, 1-6 

control sections, 3-1 

input/output section, 2-13 

instructions, 5-1 

shared resources, 2-1 

speed, 1-3 
Channel (see also 6 Mbyte per second 
channel) 

100 Mbyte per second, 2-14 

1250 Mbyte per second, 2-14 
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Channel (see also 6 Mbyte per second 
channel) (continued) 

6 Mbyte per second, 2-15 

features, 2-15 

groups, 2-20 

input/output data paths, 2-22 

I/O control, 2-21 

numbers, 2-20 

types, 1-3 
Channel Limit- Address register (CL), 2-17 
Characteristics of system, 1-3 
Check bits, 2-8 
CIP register, see Current Instruction 

Parcel register 
CL register, see Channel Limit register 
Clear programmable clock interrupt reguest, 

3-21 
Clearing maintenance mode functions, D-3 
CLN register, see Cluster Number register 
Clock 

programmable, 3-19 

real-time, 2-10 
Clock period, 1-4 

Cluster Number (CLN) register, 2-11, 3-13 
Communication, inter-CPU, 2-11 
Computation section, characteristics, 4-2 
Concurrent reads and writes, block, 2-4 
Condensing units, 1-12 
Configurations of system, 1-15 
Conflict resolution, Central Memory, 2-7 
Conflicts 

bank busy, 2-7 

memory bank, 2-23 

section access, 2-7 

shared register and semaphore, 2-13 

simultaneous bank, 2-7 
Control and data paths for a single CPU, 1-6 
Conventions, 1-1 
Correctable Memory Error Mode (ICM) flag, 

3-10 
Counter, Interrupt Countdown (ICD), 3-20 
CP, see clock period 
CPU, see Central Processing Unit 
CSB (read address), 3-8 
Current Address (CA) register, 2-17 
Current Instruction Parcel (CIP) register, 

3-2 
Cycle time, 2-1 



Data Base Address (DBA) register, 3-18 
Data format 

floating-point, 4-22 

integer, 4-21 
Data Limit Address (DLA) register, 3-18 
Deadlock (DL) flag, 3-11 
Deads tart sequence, 3-21 
Decimal equivalents, 4-22 

Derivation of the division algorithm, 4-31 
DIOP, see Disk I/O Processor 
Disk controller unit (DCU), 1-10 
Disk I/O Processor (DIOP), 1-10 
Disk storage units, 1-10 
Division algorithm, 4-30 



Division algorithm, derivation of, 4-31 
Double-precision numbers, 4-27 



E (error type), 3-8 

Enable Second Vector Logical (ESVL), 3-11 
Enhanced Addressing Mode (EAM), 3-13 
Error correction, see also SECDED 

Central Memory, 2-7 

matrix, 2-9 
Error Exit (EEX) flag, 3-12 
Errors, floating-point range, 4-23 
Exchange 

initiating, 3-15 

mechanism, 3-5 
Exchange Address (XA) register, 3-5, 3-12 
Exchange Package, 3-5 

active, 3-14 

assignments, 3-6 

contents, 3-6 

enable Second Vector Logical, 3-11 

memory error data, 3-7 

management, 3-16 

processor number, 3-7 

vector not used (VNU), 3-10 
Exchange sequence, 3-14 

Exchange sequence issue conditions, 3-15 
Exclusive NOR function, 4-36 
Exclusive OR function, 4-36 
Execution interval, 3-15 
Exponent matrix for Floating-point Multiply 

unit, 4-25 
External Interrupts flag, 3-10 



F register, see Flag register 

Fetch 

following scalar store, 2-6 
request, 2-5 

Flag (F) register, 3-11 

Flags 

Bidirectional Memory Mode (BDM), 3-9 
Correctable Memory Error Mode (ICM), 

3-10 
Deadlock (DL), 3-11 
Error Exit (EEX), 3-12 
Exchange register flags, 3-11 
External Interrupts, 3-10 
Floating-point Error (FPE), 3-11 
Floating-point Error Mode (IFP), 3-10 
Floating-point Error Status (FPS), 3-9 
I/O Interrupt (IOI), 3-12 
Interrupt from Internal CPU (ICP), 3-11 
Interrupt Monitor Mode (IMM), 3-9 
MCU Interrupt (MCU), 3-11 
Memory Error (ME), 3-12 
Monitor Mode (MM), 3-10 
Normal Exit (NEX), 3-12 
Operand Range Error (ORE), 3-12, 3-19 
Operand Range Error Mode (IOR), 3-10 
Program Range Error (PRE), 3-12, 3-19 
Programmable Clock Interrupt (PCI), 3-11 
Selected for External Interrupts (SEI), 
3-9 
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Flags (continued) 

Semaphore, 3-9 

Uncorrectable Memory Error Mode (IUM), 
3-10 

Waiting for Semaphore (WS), 3-9 
Floating-point 

Add functional unit, 4-19 

Add functional unit range error, 4-23 

addition, 4-28 

arithmetic, 4-22 

data format, 4-22 

Error (FPE) flag, 3-11 

Error Mode (IFP) flag, 3-10 

Error Status (FPS) flag, 3-9 

functional units, 4-19 

integer multiply, 4-28 

Multiply functional unit, 4-20, 4-24 

multiply partial-product sums pyramid, 
4-29 

normalized numbers, 4-23 

range errors, 4-23 

range overflow, 4-23 

Reciprocal Approximation functional 
unit, 4-27 

subtraction, 4-28 
Forward and backward branching, 3-4 
Full Vector Logical functional unit, 4-17 
Functional units, 4-14 

Address, 4-14 

floating-point, 4-19 

Floating-point Add, 4-19, 4-24 

Floating-point Multiply, 4-19, 4-24 

Floating-point Reciprocal 
Approximation, 4-27 

Full Vector Logical, 4-17 

Reciprocal Approximation, 4-20 

scalar, 4-15 

Scalar Add, 4-15 

Scalar Logical, 4-16 

Scalar Population/Parity/Leading Zero, 
4-16 

Scalar Shift, 4-16 

Second Vector Logical, 4-18 

vector, 4-16 

Vector Add, 4-17 

Vector Logical, 4-17 

Vector Population/parity, 4-18 

vector reservation, 4-17 

Vector Shift, 4-17 



g field, 5-1 

General form for instructions, 5-1 

Group descriptions, performance counter, C-2 



h field, 5-1 

Half-precision multiply, 4-29 



ICD, see Interrupt Countdown counter 

II register, see Interrupt Interval register 

ILA register, see Instruction Limit Address 

register 
In-buffer condition, 3-4 
Inclusive OR function, 4-36 
Index generation, 4-17 
Input/output 

channel, references, 2-3 

channel types, 1-3 

data paths, 2-22 

Interrupt (IOI) flag, 3-12 

interrupt, 2-16 

lockout, 2-23 

memory addressing, 2-24 

memory conflicts, 2-23 

memory request conditions, 2-2 3 

priority, 2-8 

processors, types of, 1-8 

program flowchart, 2-18 

section, 2-14 

Subsystem, 1-8 

Subsystem, data transfer, 2-14 
Input signal sequence, 6 Mbyte per second 

channel, B-l 
I/O, see Input/output 
I/O Subsystem, data transfer, 2-14 
Instruction 

Base Address (IBA) register, 3-17 

buffers, 2-1, 3-3 

descriptions, 5-6 

fetches, 2-6 

issue, 5-5 

issue and control elements, 3-1 

issue to memory ports, 2-5 

Limit Address (ILA) register, 3-18 

summary for CRAY X-MP four-processor 
computer systems, A-l 

parcel, 3-1 
Instruction formats 

1-parcel, 5-1, 5-2 

2-parcel, 5-3, 5-4 
Integer arithmetic, 4-21 
Integer data formats, 4-21 
Inter -CPU 

communication section, 2-10 

priority, 2-8 
Interfaces, 1-7 
Intermediate registers, 4-3 
Interrupt 

Countdown (ICD) Counter, 3-20 

from Internal CPU (ICP) flag, 3-11 

Interval (II) register, 3-20 

Monitor Mode (IMM) flag, 3-9 
Intra-CPU priority, 2-8 
Issue, 3-2 



j field, 5-1 



i field, 5-1 

IBA register, see Instruction Base Address 

register 
IBAR registers, see Beginning Address 

registers 



k field, 5-1 
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Limit Address (CL) register, 2-17 

Logical operations, 4-36 

Lower Instruction Parcel (LIP) register, 3-3 



m field, 5-1 

Mask operation, 4-36 

Master Clear sequence, to external devices, 

2-19 
Master I/O Processor (MIOP), 1-8 
MCU interrupt (MCU) flag, 3-11 
Memory, see also Central Memory 

addressing, I/O, 2-24 

bank conflicts, 2-23 

conflicts, I/O, 2-23 

Error (ME) flag, 3-12 

error correction, see SECDED 

error data fields, 3-7 

field protection, 3-17 

field registers, 3-9 

request conditions, I/O, 2-23 
MIOP, see Master I/O Processor 
Mode (M) register, 3-9 
Monitor Mode (MM) flag, 3-10 
Motor-generator units, 1-14 
Multi-CPU programming, 2-16 
Multiplication algorithm, 4-28 
Multiply, half-precision, 4-29 
Multiply pyramid, 4-29 



Newton's method, 4-31 

Next Instruction Parcel (NIP) register, 3-2 

Normal Exit (NEX) flag, 3-12 

Normalized floating-point numbers, 4-23 

Notation conventions, 1-1 

Numbers 

double-precision, 4-27 
normalized floating-point, 4-23 



Operand Range Error 

flag (ORE), 3-12, 3-19 
Mode (IOR) flag, 3-10 

Operating registers, 4-3 

Organization 

system, 1-5 
memory, 2-2 

Out-of -buffer condition, 3-4 



P register, see Program Address register 
Parallel vector operations, 4-11 
Parity error, 2-19 
Performance 

counter group descriptions, C-2 

monitor, 3-21, C-l 
Power distribution units, 1-13 
Priority 

inter-CPU, 2-10 

intra-CPU, 2-8 
Processor number, 3-7 

Program Address (P) register, 3-2, 3-9 
Program Range Error (PRE) flag, 3-12, 3-19 



Program State (PS) register, 3-13 
Programmable clock 

instructions, 3-19 

Interrupt (PCI) flag, 3-11 
Programmed Master Clear to external device, 
2-19 



R (read mode) , 3-8 

Reading performance results, C-3 

Real-time Clock (RTC) register 

instructions, 2-10 
Reciprocal Approximation functional unit, 

4-20 
References, memory, 2-3 
Registers 

8-bit Status, 4-8 

A, address, 4-3 

B, 4-6 

Beginning Address, 3-3 
channel Limit Address (CL), 2-17 
Cluster Number (CLN), 2-12, 3-13 
Current Address (CA), 2-17 
Current Instruction Parcel (CIP), 3-2 
Data Base Address (DBA), 3-18 
Data Limit Address (DLA), 3-18 
designators, 5-7 

Exchange Address (XA), 3-5, 3-12 
Flag (F), 3-11 

Instruction Base Address (IBA), 3-17 
Instruction Limit Address (ILA), 3-17 
Intermediate, 4-3 
Interrupt Interval (II), 3-20 
Limit Address (CL), 2-17 
Lower Instruction Parcel (LIP), 3-3 
memory field, 3-9 
Mode (M), 3-9 

Next Instruction Parcel (NIP), 3-2 
operating, 4-3 

Program Address (P), 3-2, 3-9 
Program State (PS), 3-13 
Real-time Clock (RTC) 2-10 
S, Scalar, 4-7 
Semaphore, 2-12 
shared, 2-11 

Shared Address (SB), 2-11 
Shared Scalar (ST), 2-11 
status, 4-8 
T, 4-9 

V, vector, 4-9 
Vector Control, 4-13 
Vector Length (VL), 4-13 
Vector Mask (VM), 4-13 
Reservations and chaining, V register, 4-12 



S (syndrome), 3-8 

S registers, 4-7 

Scalar 

Add functional unit, 4-15 
functional units, 4-15 
Logical functional unit, 4-16 
memory references, 2-5 
Population/parity/leading zero 
functional unit, 4-16 
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Scalar (continued) 

registers, 4-6 

Shift functional unit, 4-16 
SECDED, 2-8 

maintenance functions, D-l 

memory data path, 2-8 
Second Vector Logical functional unit, 4-17 
Section access conflict, 2-7 
Sections, 2-2 
Selected for External Interrupts (SEI) 

flag, 3-10 
Selecting performance events, C-l 
Semaphore registers, 2-12 
Shared 

register and semaphore conflicts, 2-13 

registers, 2-11 

resources, 2-1 
Shared Address (SB) registers, 2-11 
Shared Scalar (ST) registers, 2-11 
Simultaneous bank conflict, 2-7 
Solid-state Storage Device, 1-11 

data transfer, 2-14 

chassis, 1-11 
Special characters, 5-7 
Special register values, 5-5 
SSD, see Solid-state Storage Device 
Status register, 4-8 
Syndrome, 2-8 
System 

basic organization, 1-5 

block diagram with block multiplexer 
channels, 1-16 

block diagram with full disk capacity, 
1-15 

characteristics, 1-3 

components, 1-5 

configuration, 1-15 

description, 1-1 

physical characteristics, 1-3 



Vector (continued) 

Mask (VM) register, 4-13 

not used (VNU), 3-10 

Population/parity functional unit, 4-19 

registers, 4-9 

Shift functional unit, 4-17 
Verification of 

check bit generation, D-2 

check bit storage, D-l 

error detection and correction, D-2 
VNU - vector not used, 3-10 



Waiting for Semaphore (WS) flag, 3-9 
Word assembly/disassembly for 6 Mbyte per 

second channel, 2-17 
Word size, memory, 2-1 



XA register, see Exchange Address register 
XIOP, see Auxiliary I/O Processor 



T registers, 4-9 

Testing performance counters, C-4 

Time slot, 2-23 

Transfer rate 

instruction buffers, 2-1 

I/O section, 2-1 
Twos complement integer arithmetic, 4-21 



Uncorrectable Memory Error Mode (IUM) flag, 

3-10 
Unexpected Ready signal, 2-20 



V register reservations and chaining, 

V registers, 4-9 
Vector 

Add functional unit, 4-17 
control registers, 4-13 
functional unit reservation, 4-17 
functional units, 4-16 
Length (VL) register, 4-13 
Logical functional units, 4-18 



4-12 
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